The-Forge | macOS Ecosystem Directory

The-Forge - Release 1.58 - June 17th, 2024 Behemoth | Compute-Driven Mega Particle System | Triangle Visibility Buffer 2.0 Latest Release

Published by AntoineConffx about 2 months ago

Release 1.58 - June 17th, 2024 Behemoth | Compute-Driven Mega Particle System | Triangle Visibility Buffer 2.0 |

Announce trailer for Behemoth

We helped Skydance Interactive to optimize Behemoth last year. Click on the image below to see the announce trailer:

Compute-Based Mega Particle System

This unit test was based on some of our research into software rasterization and GPU-driven rendering. A particle system completely running in very few compute shaders with one large buffer holding most of the data. Like with all things GPU-Driven, the trick is to execute one compute shader once on one buffer to reduce read / write memory bandwidth. Although this is not new wisdom, you will be surprised how many particle systems get this still wrong ... having compute shaders for each stage of the particle life time or even worse doing most of the particle work on the CPU.
This particle system was demoed last year in a few talks in September on a Samsung S22. Here are the slides:

http://www.conffx.com/WolfgangEngelParticleSystem.pptx

It is meant to be used to implement next-gen Mega Particle systems in which we simulate always 100000th or millions of particles at once instead of the few dozen ones contemporary systems simulate.

Android Samsung S22 1170x540 resolution

This screenshot shows 4 million firefly-like particles, with 10000 lights attached to them and a shadow for the directional light. Those numbers were thought to be not possible on mobile phones before.
Mega Particle System Android Samsung S22

Android Samsung S23 1170x540 resolution

Same setting as above but this time also with 8 Shadows from Point Lights additionally.
Mega Particle System Android Samsung S23

Android Samsung S24 1170x540 resolution

Same setting as above but this time also with 8 Shadows from Point Lights additionally.
Mega Particle System Android Samsung S24

PS5 running at 4K

Mega Particle System PS5

Windows with AMD RX 6400 at 1080p

Mega Particle System PC Windows

Triangle Visibility Buffer 2.0

we have the new compute based TVB 2.0 approach now running on all platforms (on Android only S22). You can download slides from the I3D talk from

http://www.conffx.com/I3D-VisibilityBuffer2.pptx

The-Forge - Release 1.57 - May 8th, 2024 Visibility Buffer 2.0 Prototype | Visibility Buffer 1.0 One Draw call

Published by AntoineConffx 6 months ago

Release 1.57 - May 8th, 2024 Visibility Buffer 2.0 Prototype | Visibility Buffer 1.0 One Draw call

Visibility Buffer Research - I3D talk

We are giving a talk about our latest Visibility Buffer research on I3D. Here is a short primer what it is about:

The original idea of the Triangle Visibility Buffer is based on an article by [[burns2013]. [schied15] and [schied16] extended what was described in the original article. Christoph Schied implemented a modern version with an early version of OpenGL (supporting MultiDrawIndirect) into The Forge rendering framework in September 2015.
We ported this code to all platforms and simplified and extended it in the following years by adding a triangle filtering stage following [chajdas] and [wihlidal17] and a new way of shading.
Our on-going improvements simplified the approach incrementally and the architecture started to resemble what was described in the original article by [burns2013] again, leveraging the modern tools of the newer graphics APIs.
In contrast to [burns2013], the actual storage of triangles in our implementation of a Visibility Buffer happens due to the triangle removal and draw compaction step with an optimal “massaged” data set.
By having removed overdraw in the Visibility Buffer and Depth Buffer, we run a shading approach that shades everything with one regular draw call. We called the shading stage Forward++ due to its resemblance to forward shading and its usage of a tiled light list for applying many lights. It was a step up from Forward+ that requires numerous draw calls.
We described all this in several talks at game industry conferences, for example on GDCE 2016 [engel16] and during XFest 2018, showing considerable performance gains due to reduced memory bandwidth compared to traditional G-buffer based rendering architectures.
A blog post that was updated over the years for what we call now Triangle Visibility Buffer 1.0 (TVB 1.0) can be found here [engel18].

Over the last years we extended this original idea with a Order-Independent Transparency approach (it is more efficient to sort triangle IDs in a per-pixel linked list compared to storing layers of a G-Buffer), software VRS and then we developed a Visibility Buffer approach that doesn't require draw calls to fill the depth and Visibility Buffer and one that requires much less draw calls in parallel.
This release offers -what we call- an updated Triangle Visibility Buffer 1.0 (TVB 1.0) and a prototype for the Triangle Visibility Buffer 2.0 (TVB 2.0).

The changes to TVB 1.0 are evolutionary. We used to map each mesh to an indirect draw element. This reuqired the use of DrawID to map back to the per-mesh data. When working on a game engine with a very high amount of draw calls, it imposed a limitation on the number of "draws" we could do, due to having only a limited number of bits available in the VB.
Additionally, instancing was implemented using a separate instanced draw for each instanced mesh. We refactored the data flow between the draws and the shade pass.
There is now no reliance on DrawID and instances are handled transparently using the same unified draw. This both simplifies the flow of data and allows us to draw more "instanced" meshes.
Apart from being able to use a very high-number of draw calls, the performance didn't change.

The new TVB 2.0 approach is revolutionary in a sense that it doesn't use draw calls anymore to fill the depth and visibility buffer. There are two compute shader invocations that filter triangles and eventually fill the depth and visibility buffer.
Not using draw calls anymore, makes the whole code base more consistent and less convoluted -compared to TVB 1.0-.

You can find now the new Visibilty Buffer 2 approach in

The-Forge\Examples_3\Visibility_Buffer2

This is still in an early stage of development. We only support a limited number of platforms: Windows D3D12, PS4/5, XBOX, and macOS / iOS.

Sanitized initRenderer

we cleaned up the whole initRenderer code. Merged GPUConfig into GraphicsConfig and unified naming.

Metal run-time improvements

We improved the Metal Validation Support.

Art

Everything related to Art assets is now in the Art folder.

Bug fixes

Lots of fixes everywhere.

References:
[burns2013] Christopher A. Burns, Warren A. Hunt, "The Visibility Buffer: A Cache-Friendly Approach to Deferred Shading", 2013, Journal of Computer Graphics Techniques (JCGT) 2:2, Pages 55 - 69.

[schied2015] Christoph Schied, Carsten Dachsbacher, "Deferred Attribute Interpolation for Memory-Efficient Deferred Shading" , Kit Publication Website: http://cg.ivd.kit.edu/publications/2015/dais/DAIS.pdf

[schied16] Christoph Schied, Carsten Dachsbacher, "Deferred Attribute Interpolation Shading", 2016, GPU Pro 7, Pages

[chajdas] Matthaeus Chajdas, GeometryFX, 2016, AMD Developer Website http://gpuopen.com/gaming-product/geometryfx/

[wihlidal17] Graham Wihlidal, "Optimizing the Graphics Pipeline with Compute", 2017, GPU Zen 1, Pages 277--320

[engel16] Wolfgang Engel, "4K Rendering Breakthrough: The Filtered and Culled Visibility Buffer", 2016, GDC Vault: https://www.gdcvault.com/play/1023792/4K-Rendering-Breakthrough-The-Filtered

[engel18] Wolfgang Engel, "Triangle Visibility Buffer", 2018, Wolfgang Engel's Diary of a Graphics Programmer Blog http://diaryofagraphicsprogrammer.blogspot.com/2018/03/triangle-visibility-buffer.html

Published by AntoineConffx 7 months ago

Release 1.56 - April 4th, 2024 I3D | Warzone Mobile | Visibility Buffer | Aura on macOS | Ephemeris on Switch | GPU breadcrumbs | Swappy in Android | Screen-space Shadows | Metal Debug Markers improved

I3D

We are sponsoring I3D again. Come by and say hi! We also will be giving a talk on the new development around Triangle Visibility Buffer.

Warzone Mobile launched

We work on Warzone Mobile since August 2020. The game launched on March 21, 2024.

Warzone Mobile

Visibility Buffer

We removed CPU cluster culling and simplified the animation data usage. Now traingle filtering only takes one dispatch each frame again.

Swappy frame pacer is now vailable in Android/Vulkan

We integrated the Swappy frame pacer into the Android / Vulkan eco system.

GPUCfg system improved with more ids and less string compares

we did another pass on the GPUCfg system and now we can generate the vendor Ids and model Ids with a python script to keep the *_gpu.data list easily up to date for each platform.
We removed most of the name comparisons and replaced them with the id comparisons which should speed up parsing time and is more specific.

Screen-Space Shadows in UT9

We added to the number of shadow approaches in that unit test screen-space shadows. These are complementary to regular shadow mapping and add more detail. We also fixed a number of inconsistencies with the other shadow map approaches.

PS5 - Screen-Space Shadows on
Screen-Space Shadows PS5

PS5 - Screen-Space Shadows off
Screen-Space Shadows PS5

Nintendo Switch
Screen-Space Shadows Switch

PS4
Screen-Space Shadows PS4

GPU breadcrumbs on all platforms

Now you can have GPU crash reports on all platforms. We skipped OpenGL ES and DX11 so ...

A simple example of a crash report is this:

2024-04-04 23:44:08 [MainThread ] 09a_HybridRaytracing.cp:1685 ERR| [Breadcrumb] Simulating a GPU crash situation (RAYTRACE SHADOWS)...
2024-04-04 23:44:10 [MainThread ] 09a_HybridRaytracing.cp:2428 INFO| Last rendering step (approx): Raytrace Shadows, crashed frame: 2

We will extend the reporting a bit more over time.

Ephemeris now also runs on Switch ...

The-Forge - Release 1.55 - March 1st, 2024 - Ephemeris | gpu.data | Many bug fixes and smaller improvements

Published by AntoineConffx 8 months ago

Release 1.55 - March 1st, 2024 - Ephemeris | gpu.data | Many bug fixes and smaller improvements

Ephemeris 2.0 Update

We improved Ephemeris again and support it now on more platforms. Updating some of the algorithms used and adding more features.

Now we are supporting PC, XBOX'es, PS4/5, Android, Steamdeck, iOS (requires iPhone 11 or higher (so far not Switch)

Ephemeris on XBOX Series X
Ephemeris 2.0 on February 28th, 2024

Ephemeris on Android
Ephemeris 2.0 on February 28th, 2024

Ephemeris on PS4
Ephemeris 2.0 on February 28th, 2024

Ephemeris on PS5
Ephemeris 2.0 on February 28th, 2024

IGraphics.h

We changed the graphics interface for cmdBindRenderTargets

// old
DECLARE_RENDERER_FUNCTION(void, cmdBindRenderTargets, Cmd* pCmd, uint32_t renderTargetCount, RenderTarget** ppRenderTargets, RenderTarget* pDepthStencil, const LoadActionsDesc* loadActions, uint32_t* pColorArraySlices, uint32_t* pColorMipSlices, uint32_t depthArraySlice, uint32_t depthMipSlice)
// new
DECLARE_RENDERER_FUNCTION(void, cmdBindRenderTargets, Cmd* pCmd, const BindRenderTargetsDesc* pDesc)

Instead of a long list of parameters we now provide a struct that gives us enough flexibility to pack more functionality in there.

Variable Rate Shading

We added Variable Rate Shading to the Visibility Buffer OIT example test 15a. This way we have a better looking test scene with St. Miguel.

VRS allows rendering parts of the render target at different resolution based on the auto-generated VRS map, thus achieving higher performance with minimal quality loss. It is inspired by Michael Drobot's SIGGRAPH 2020 talk: https://docs.google.com/presentation/d/1WlntBELCK47vKyOTYI_h_fZahf6LabxS/edit?usp=drive_link&ouid=108042338473354174059&rtpof=true&sd=true

The key idea behind the software-based approach is to render everything in 4xMS targets and use a stencil buffer as a VRS map. VRS map is automatically generated based on the local image gradients.
It could be used on a way wider range of platforms and devices than the hardware-based approach since the hardware VRS support is broken or not supported on many platforms. Because this software approach utilizes 2x2 tiles we could also achieve higher image quality compared to hardware-based VRS.

Shading rate view based on the color per 2x2 pixel quad:

White – 1 sample (top left, always shaded);
Blue – 2 horizontal samples;
Red – 2 vertical samples;
Green – all 4 samples;

PC
VRS

Debug Output with the original Image on PC
VRS

PC
VRS

Debug Output with the original Image on PC
VRS

Android
VRS

Debug Output with the original Image on Android
VRS

Android
VRS

Debug Output with the original Image on Android
VRS

UI description:

Toggle VRS – enable/disable VRS
Draw Cubes – enable/disable dynamic objects in the scene
Toggle Debug View – shows auto-generated VRS map if VRS is enabled
Blur kernel Size – change blur kernel size of the blur applied to the background image to highlight performance benefits of the solution by making fragment shader heavy enough.
Limitations:
Relies on programmable sample locations support – not widely supported on Android devices.

Supported platforms:
PS4, PS5, all XBOXes, Nintendo Switch, Android (Galaxy S23 and higher), Windows(Vulkan/DX12), macOS/iOS.

gpu.data

You want to check out those files. They are now dedicated per supported platform. So it is easier for us to differ between different Playstations, XBOX'es, Switches, Android, iOS etc..

Unlinked Multi GPU

The Unlinked Multi GPU example was broken on AMD 7x GPUs with Vulkan. This looks like a bug. We had to disable DCC to make that work.

Vulkan

we track GPU memory now and will extend this to other platforms.

Vulkan mobile support

We support now the VK_EXT_ASTC_DECODE_MODE_EXTENSION_NAME extension

Remote UI

Various bug fixes to make this more stable. Still alpha ... will crash.

Retired:

35 Variable Rate Shading ... this went into the Visibility Buffer OIT example 15a.
Basis Library - after not having found any practical usage case, we remove Basis again.

Published by AntoineConffx 9 months ago

Release 1.54 - February 2nd, 2024 - Remote UI Control | Shader Server | Visibility Buffer | Asset Pipeline | GPU Config System | macOS/iOS | Lots more ...

Our last release was in October 2022. We were so busy that we lost track of time. In March 2023 we planned to make the next release. We started testing and fixing and improving code up until today. The amount of improvements coming back from the -most of the time- 8 - 10 projects we are working on where so many, it was hard to integrate all this, test it and then maintain it. To a certain degree our business has higher priority than making GitHub releases but we realize that letting a lot of time pass makes it substantially harder for us to get the whole code base back in shape, even with a company size of nearly 40 graphics programmers. So we cut down functional or unit tests, so that we have less variables. We also restructured large parts of our code base so that it is easier to maintain. One of the constant maintenance challenges were the macOS / iOS run-time (More about that below).
We invested a lot in our testing environment. We have more consoles now for testing and we also have a much needed screenshot testing system. We outsource testing to external service providers more. We removed Linux as a stand-alone target but the native Steamdeck support should make up for this.
We tried to be conservative about increasing API versions because we know on many platforms our target group will use older OS or API implementations. Nevertheless we were more adventurous this year then before. So we bumped up with a larger step than in previous years.
Our next release is planned for in about four weeks time. We still have work to do to bring up a few source code parts but now the increments are much smaller.
In the meantime some of the games we worked on, or are still working on, shipped:

Forza Motorsport has launched in the meantime:

Starfield has launched:

Starfield

No Man Sky has launched on macOS:

No Man's Sky

Internal automated testing setup on our internal GitLab server

Our automated testing setup that tests all the platforms now takes 38 minutes for one run. At some point it was more. We revamped this substantially since the last release adding now screenshot comparisons and a few extra steps for static code analysis.

Visibility Buffer

the Visibility Buffer went through a lot of upgrades since October 2022. I think the most notable ones are:
- Refactored the whole code so that it is easier to re-use in all our examples, there is now a dedicated Visibility Buffer directory holding this code
- Animation of characters is now integrated
- Tangent and Bi-Tangent calculation is moved to the pixel shader and we removed the buffers

Software Variable Rate Shading

This Unit test represents software-based variable rate shading (VRS) technique that allows rendering parts of the render target at different resolution based on the auto-generated VRS map, thus achieving higher performance with minimal quality loss. It is inspired by Michael Drobot's SIGGRAPH 2020 talk: https://docs.google.com/presentation/d/1WlntBELCK47vKyOTYI_h_fZahf6LabxS/edit?usp=drive_link&ouid=108042338473354174059&rtpof=true&sd=true

PC Windows (2560x1080):
Variable Rate Shading on PC

Switch (1280x720):
Variable Rate Shading on Switch

XBOX One S (1080p):
Variable Rate Shading on XBOX One S

PS4 Pro (3840x2160):
Variable Rate Shading on XBOX One S

The key idea behind the software-based approach is to render everything in 4xMS targets and use a stencil buffer as a VRS map. The VRS map is automatically generated based on the local image gradients.
The advantage of this approach is that it runs on a wider range of platforms and devices than the hardware-based approach since the hardware VRS support is broken or not supported on many platforms. Because this software approach utilizes 2x2 tiles we can also achieve higher image quality compared to hardware-based VRS.

Shading rate view based on the color per 2x2 pixel quad:

White – 1 sample (top left, always shaded);
Blue – 2 horizontal samples;
Red – 2 vertical samples;
Green – all 4 samples;

Variable Rate Shading Debug

UI description:

Toggle VRS – enable/disable VRS
Draw Cubes – enable/disable dynamic objects in the scene
Toggle Debug View – shows auto-generated VRS map if VRS is enabled
Blur kernel Size – change blur kernel size of the blur applied to the background image to highlight performance benefits of the solution by making fragment shader heavy enough.
Limitations:
Relies on programmable sample locations support – not widely supported on Android devices.

Supported platforms:

PS4, PS5, all XBOXes, Nintendo Switch, Android (Galaxy S23 and higher), Windows(Vulkan/DX12).
Implemented on MacOS/IOS, but doesn’t give expected performance benefits due to the issue with stencil testing on that platform

Shader Server

To enable re-compilation of shaders during run-time we implemented a cross-platform shader server that allows to recompile shaders by pressing CTRL-S or a button in a dedicated menu.
You can find the documentation in the Wiki in the FSL section.

Remote UI Control

When working remotely, on mobile or console it can cumbersome to control the development UI.
We added a remote control application in Common_3\Tools\UIRemoteControl which allows control of all UI elements on all platforms.
It works as follows:

Build and Launch the Remote Control App located in Common_3/Tools/UIRemoteControl
When a unit test is started on the target application (i.e. consoles), it starts listening for connections on a part (8889 by default)
In the Remote Control App, enter the target ip address and click connect

Remote UI Control

This is alpha software so expect it to crash ...

VK_EXT_device_fault support

This extension allows developers to query for additional information on GPU faults which may have caused device loss, and to generate binary crash dumps.

Ray Queries in Ray Tracing

We switched to Ray Queries for the common Ray Tracing APIs on all the platforms we support. The current Ray Tracing APIs increase the amount of memory necessary substantially, decrease performance and can't add much visually because the whole game has to run with lower resolution, lower texture resolution and lower graphics quality (to make up for this, upscalers were introduced that add new issues to the final image).
Because Ray Tracing became a Marketing term valuable to GPU manufacturers, some game developers support now Ray Tracing to help increase hardware sales. So we are going with the flow here by offering those APIs.

macOS (1440x810)
Ray Queries on macOS

PS5 (3840x2160)
Ray Queries on PS5

Windows 10 (2560x1080)
Ray Queries on Windows 10

XBOX One Series X (1920x1080)
Ray Queries on XBOX One Series X

iPhone 11 (Model A2111) at resolution 896x414
Ray Queries on iOS

We do not have a denoiser for the Path Tracer.

GPU Configuration System

This is a cross-platform system that can track GPU capabilities on all platforms and switch on and off features of a game for different platforms. To read a lot more about this follow the link below.

GPU Configuration system

New macOS / iOS run-time

We think the Metal API is a well balanced Graphics API that walks the path between low-level and high-level very well. We ran into one general problem with the Metal API for both platforms. It is hard to maintain the code base. There is an architectural problem that was probably introduced due to lack in experience in shipping games.
In essence what Apple decided to do is have calls like this:

https://developer.apple.com/documentation/swift/marking-api-availability-in-objective-c

Anything a hardware vendor describes as available and working might not be working with the next upgrade of the operating system, hardware or just the API or XCode.
If you have a few hundred of those macros in your code, it becomes a lottery what works and what not on a variety of hardware. On some hardware one is broken, on the other hardware something else.
So there are two ways to deal with this: for every @available macro you start adding a #define to switch off or replace that code based on the underlying hardware and software platform. You would have to manually track if what the macro says is true on a wide range of platforms with different outcome.
So for example on macOS 10.13 running on a certain Macbook Pro (I make this up) with an Intel GPU it is broken but then a very similar Macbook Pro that has additionally a dedicated GPU actually runs it. Now you have to track what "class of Macbook Pro" we are talking about and if the Macbook Pro in question has an Intel or an AMD GPU.
We track all this data already so that is not a problem. We know exactly what piece of hardware we are looking at (see above GPU Config system).
The problem is that we have to guard every @available macro with some of this. From a QA standpoint that generates an explosion of QA requests. To cut down on the number of variables we decided to focus only on calls that are available in two different macOS and two different iOS versions. Here is the code in iOperatingSystem.h

// Support fixed set of OS versions in order to minimize maintenance efforts
// Two runtimes. Using IOS in the name as iOS versions are easier to remember
// Try to match the macOS version with the relevant features in iOS version
#define IOS14_API     API_AVAILABLE(macos(11.0), ios(14.1))
#define IOS14_RUNTIME @available(macOS 11.0, iOS 14.1, *)

#define IOS17_API     API_AVAILABLE(macos(14.0), ios(17.0))
#define IOS17_RUNTIME @available(macOS 14.0, iOS 17.0, *)

Dynamic Rendering extension - VK_KHR_dynamic_rendering

We were one of the big proponents of the Dynamic Rendering extension. As game developers we took over part of the driver development by adopting Vulkan as a Graphics API. The cost of game production rose substantially due to that because our QA efforts had to be increased to deal with an API that is more lower level.
One of the interesting findings that were made by many who adopted Vulkan in games is that a Vulkan run-time in most cases runs slower on the GPU compared to the DirectX 11 run-time. It is very hard to optimize a Vulkan run-time to run as fast as a DirectX 11 run-time on the GPU. This is due to parts of the responsibilities having shifted from the device driver writer to the game developer. The main advantage of using Vulkan is the lower CPU overhead. While older DirectX 11 drivers were so inefficiently programmed that they were bringing down high-end PC CPUs and did not allow anymore to run two GPUs in SLI / Crossfire (hence the practical death of multi-GPU support for games), Vulkan has a substantially lower CPU overhead.
So that being said the one thing that shouldn't have been moved from the driver into the game developer space is the render pass concept. It is so close to the hardware that device driver writers can deal much better with it then game developers. In our measurements of tiled hardware renderers, using tiles never had a positive effect. We can imagine a device driver writer with direct access to the hardware can make tiles run much faster than we can.
This is why we embrace the dynamic rendering extension.

Removed glTF loading from the resource loader

glTF is an art exchange format but not suitable to load game assets. There are mostly two reasons for this: it doesn't make sense to load one glTF file for each "model" and it also doesn't make sense to load one glTF file for a scene because that would not allow streaming. Generally the way we load art assets is in one large zipped file with one "fopen" call. To not use any other OS calls we load from this large file all the assets by looking into a look-up table, find the address in memory, run a pointer there and then copy the data into system memory (that was a simplified view but should suffice for this purpose). To allow streaming depending on the type of game, we pre-package art assets in meaningful ways so when we load on demand they are as expected.
glTF does not align with this concept or the fact that we compress data to fit into our internal caches. In other words it doesn't make sense to use glTF during run-time of a game.

The Asset Pipeline now loads and converts and optimizes glTF data into our internal format offline. This can be adjusted to the needs of any type of game and represents currently a proof-of-concept.

Unified the art assets to St. Miguel

Today there is no point in using Sponza as a test scene anymore. One can argue that even St. Miguel does not fullfil that purpose. Nevertheless it is currently the best we got. We removed all Sponza art assets and replaced them with St. Miguel.

Updated Wiki Documentation

We updated the Wiki documentation. Check it out. We know it could be more ...

Retired functional / unit tests
- 04_ExecuteIndirect - it is used now extensively in the Visibility Buffer ... we don't need that test anymore
- 07_Tessellation - similar to geometry shaders, the tessellation shaders never really worked out. So we are removing the unit test
- 18_VirtualTexture - similar to geometry shader and tessellation shader, this didn't seem to have worked out ... support became more spotty ... so better remove it
- 33_YUV - looks like this never worked out and was only supported by a small amount of hardware / software combinations ... so not useful for game development
- 37_PrecomputedVolumeDLUT - a very specific technique that didn't show any new abilities, so we removed it
- 38_AmbientOcclusion_GTAO - the maintainer could not fix one bug in the implementation ... so we removed it until someone else can write a consistent implementation for all platforms

Published by AntoineConffx 9 months ago

Release 1.53 - October 5th, 2022 - Steamdeck Support | App life cycle changes | Shader Byte Code Offline Generation | GTAO Unit Test | Improved gradient calculation in Visibility Buffer | New C Containers | Reorg TF Directory Structure | Upgraded to newer ImGUI | The Forge Blog

The Starfield Official Gameplay Reveal Trailer is out. It always brings us pleasure to see The Forge running in AAA games like this:

We added The Forge to the Creation Engine in 2019.

The Forge made an appearance during the Apple developer conference 2022. We added it to the game "No Man's Sky" from Hello Games to bring this game up on macOS / iOS. For the Youtube video click on the image below and jump to 1:22:40

We switched our Linux OS to Manjaro to have an easier upgrade path to the Steamdeck. Please note the changed Linux requirements below.
Shader byte code can now be generated offline.
- Shader binaries are compiled through FSL
- Introduced ShaderList files that determine all the binary shaders that FSL needs to produce. Defines, shader target and other specific configuration can be specified per shader binary declaration
- Update all projects (UT, VB, Aura, Ephemeris) to use the new ShaderLists
- Remove all ShaderStageLoadDesc::pMacros, shaders are compiled offline through ShaderLists
- Remove all Renderer::pBuiltinShaderDefines, all configuration is done through FSL
Over the last few projects we had always challenges with EASTL. So over the last 9 months we slowly removed it and replaced it by new C language based containers that prefer stack allocations over heap allocations.
There is a new unit test that helps us to test the new libraries.

For string management:
bstrlib

For dynamic arrays and hash tables:
stb_ds.h

There is a new unit test to make sure those new containers are tested. It is called 36_AlgorithmsAndContainers

We changed the App life cycle: modern APIs have so many ways to reset the driver or reload assets, so we made a more flexible "reload" mechanism that generalizes all the special cases we had in there before.
- App extended with reload functionality by making use of ReloadDesc* parameter for the Load/Unload functions
- define reload/reset descriptors structs
- define reload/reset enum types
- Updated OS base files regarding new structs
- Able to reload shaders on all examples
  This is a breaking change to all of our rendering interfaces.
New Animation test that unifies most of the former animation tests into one. This way we can save some testing time in our Jenkins setup.
We added a new unit test called 38_AmbientOcclusion_GTAO. It implements the paper "Practical Real-Time Strategies for Accurate Indirect Occlusion" by Jorge Jimenez et. all.

macOS
GTAO running on macOS

PC
GTAO running on PC

PS4
GTAO running on PS4

PS5
GTAO running on PS5

Switch
GTAO running on Switch

XBOX
GTAO running on XBOX

We improved the gradient calculation in the Visibility Buffer. Thanks to Stephen Hill @self_shadow who brought this to our attention.
We reorganized the whole TF directory structure to allow development in more areas. Here is an image representing the new structure:

The Forge Reorg

What is still missing is the "Render Abstraction Layer", "Scene Loader" and we have to populate the "Game Layer" more.

We upgraded to ImGUI 1.88 to get access to the docking feature. In the process we improved the ImGUI integration substantially.
We started a blog for The Forge at The-Forge-Blog. We have no idea where we can find the time to write blog posts ... let's see what is happening ...
Retired Unit/Functional Tests:
- 08_GltfViewer - generally glTF is not a model format that is applicable for game development. So we use it as an intermediate format in the Resource loader. In the future we might only use it in the offline asset pipline. The main idea is to extract the data and bring it into a form that is usable in games. Unfortunately many people thought that the glTF viewer is a good model to start with. So we want to guide them in the right diretion here by not offering direct access to a glTF reader anymore.
- Most of the animation unit tests are now merged into 21_Animations, to reduce our hardware testing time. Our Jenkins testing environment that tests all platforms before someone can merge code is taking too long.

Published by AntoineConffx 9 months ago

Release 1.52 - April 29th, 2022 - C Code Hot Reloading Unit Test | Visibility Buffer OIT | Pre-Computed DLUT Test | Unified Window and Resolution control | Android Vulkan Validation Layer | CPU Features | Upgraded Vulkan and DX GPU allocator | macOS / iOS improvements | Double precision Math Library | Impoved Input System with HID support

We are always looking for more graphics / engine programmers. We are also specifically looking for a consultant who can help us to scale up our hardware testing environment.

The following list of changes is not fully representative of all the improvements we made, so it is just a selection:

C Code Hot Reloading Unit Test - This unit test showcases an implementation of code hot reloading in C, we've used and adapted the following GitHub library

for this.

C Code Hot Reloading Unit test

The test contains two projects:

19_CodeHotReload_Main: generates the executable. All code in this project can't be hot-reloaded. This is the project you should set as startup project when running the program form an IDE.
19a_CodeHotReload_Game: for development platforms Windows/MacOS/Linux generates a dynamic library that is loaded by the Main project in runtime, when the dynamic library changes the Main program reloads the new code. For Android/IOS/Quest/Consoles this project is compiled and linked statically.

How to use it: While the Main project is running open 19_CodeHotReload_Game.cpp and perform some change, there are lines marked with TRY_CODE_RELOAD to make easy changes. Once the file is saved, you can rebuild the project and see the changes happen automatically.

Windows/Linux: Click on the UI "RebuildGame" button.
MacOS: Command+B on XCode to rebuild.

Note: In this implementation we can't call any functions from The Forge from the HotReloadable project (19a_CodeHotReload_Game), this is because we are compiling OS and Renderer as static libraries and linking them directly to the exe. Ideally these projects should be compiled as dynamic libraries in order to expose their functionality to the exe and hot reloadable dll. The reason we didn't implement it in this way is because all our other projects are already setup to use static libraries.

Visibility Buffer Order-Independent Transparency - we added OIT by utilizing a per-pixel linked list to a Visibility Buffer (VB) rendering architecture. In case of Deferred Shading (DS), the per-pixel linked list holds per-pixel data. In case of VB it only holds the triangle index data. You can switch between DS and VB in this example. The VB version occupies substantially less memory and is faster. With memory bandwidth being the biggest challenge in graphics programming, this is not unexpected. Most people by now adopted the idea of VB in one or two ways but it doesn't hurt to show another advantage of the architecture.

Linux 1080p resolution
Visibility Buffer OIT Linux

macOS 3200x1760 resolution
Visibility Buffer OIT macOS

PS4 1080p resolution
Visibility Buffer OIT Orbis

PS5 4k resolution
Visibility Buffer OIT Prospero

Windows 10 1080p resolution
Visibility Buffer OIT Windows

XBOX One (original) 1080p resolution
Visibility Buffer OIT Orbis

Pre-Computed DLUT Test - this test implements pre-computing volume transmittance in Blender or Houdini for 6 directions and shading clouds/smoke based on the following tweets:

https://twitter.com/Vuthric/status/1286796950214307840

A detailed description can be found here: https://realtimevfx.com/t/smoke-lighting-and-texture-re-usability-in-skull-bones/5339

DLUT Test Blender Support

In this repository is a "dlut.blend" file that contains a minimal volumetric render setup. In order to generate DLUT image do the following steps:

Set the viewport shading to "Rendered"
Select the "Sun" object
Set the X rotation to 0 degrees
Press F12 to render the image and wait for a few minutes until it's done
Save the rendered image to "dlut_0.png"
Repeat steps 3-5 for 90, 180 and 270 degrees and save "dlut_90.png", "dlut_180.png" and "dlut_270.png"
Run the "combine_dlut.py" Python script or manually combine rendered images in your image editor of choice, each color channel should contain the red channel from the corresponding "dlut_*.png" image multiplied by the alpha channel of the same image. For example, green channel should contain the red channel from "dlut_90.png" multiplied by the alpha channel of "dlut_90.png"
Experiment and implement further ideas from the article above. Setting up a Mantaflow simulation in Blender and exporting animated smoke and simulation attributes like temperature can yield interesting results!

Resulting DLUT image should look like this:

DLUT Test Blender Support

The example program running on Android:

DLUT Test running on Android

Window Management - all the platforms that support the concept of having a windowed application have now a base file named {Platform}Window.cpp. There is now a common UI element that offers -if supported- multi-monitor support and various window settings. There are also LUA scripts that test the functionality in our Jenkins setup.
Android Vulkan Validation layers: we added the validation layer from Khronos GitHub repo as they have stopped shipping the layer in the NDK.

Android Vulkan Validation Layers

You can find them in ThirdParty/OpenSource/AndroidVulkanValidationLayers

CPU / GPU Features - we integrated the following library to test CPU features during start-up. Now you will see a lot more information about the CPU in the upper left corner of a window.

CPU Features

This library is the stepping stone of utilizing more CPU instrinsics on various platforms. You can see its results in the screenshots above, showing the name of the CPU, the supported instruction set. We also show now the GPU name and the driver version that the GPU uses.

Upgraded Vulkan and DX Allocators: following the updates to these open-source libraries on GitHub we upgraded our code base accordingly.
macOS / iOS - while working with TF on various projects, we bring back improvements and lessons learned from those projects. You will find numerous macOS / iOS improvements in this release.
For one of the business applications we worked on, we needed double precision Math. We extended the math library now accordingly with support.
We also improved the input system with HID support, which is an on-going effort. So better controller support on more platforms ...

HIDAPI

Windows 7 - better Windows 7 support with DX11 and Vulkan ... still a bug in the Vulkan run-time with sRGB ...
We upgraded the 06_MaterialPlayground with shadows:

Material Playground Unit Test

Retired unit test: we are going to retire many unit tests now because our automated testing cycle takes too long and heats up the "engine" room (see above passage on us looking for an consultant to scale up our testing environment). Today we retire:
- 02_Compute
- 05_FontRendering
- 13_UserInterface - we might create a much more advanced one for tools development in the future
- 16a_SphereTracing
- 32_Windows - not necessary anymore with every unit test now offering windows management
Resolved GitHub Issues:

The-Forge -

Published by AntoineConffx 9 months ago

Release 1.51 - December 21st, 2021 - ECS uses flecs | Better Borderless Window | Descriptor Management improvements | sRGB | Android Game Development Extensions | FSL Improvements | Ray Tracing | Meshoptimizer | Buildbox | Lethis

Happy Holidays! 🎄🎅🔥🎁🧨

We wish you and your loved ones all the best for the Holidays and a Happy New Year 2022!

This update is again a mixture of things we learned while integrating The Forge and feedback and contributions from our users. Thanks for all the support!

In one of the next updates we will remove EASTL and offer dedicated containers compatible with C99. Over time EASTL was a huge productivity burner. The inefficient memory access patterns hugged too much CPU time in games where we integrated TF and we always had to go back and fix those later manually.
We know this is a breaking change but considering that STL was a good idea on CPUs 20+ years ago, we would like to align more with what modern CPUs are expecting.

We keep moving towards C99 usage. We replaced the old ECS code with Flecs:

Now our build times are much better and the overall system runs faster:

CPU: Intel i7-7700k
GPU: AMD Radeon RX570

Old ECS
Debug
Single Threaded: 90.0ms 
Multi Threaded 29.0ms

Release:
Single Threaded: 5.7ms
Multi Threaded: 2.3ms


flecs
Debug
Single Threaded: 23.0ms   
Multi Threaded 6.8ms

Release
Single Threaded 1.7ms
Multi Threaded 0.9ms

Descriptor Management improvements - we changed the rendering interface for all platforms - cmdBindPushConstants now takes an index instead of a name, we also allow partial updates of array descriptors
Borderless window - there are improvements to borderless windows support
- remove borderless window "top white bar" on Windows OS
- add "Win Key + arrow" behavior (for standard maximize/minimize/split of the borderless window using keyboard)
- top resize area necessarily overlaps rendering (this is how we can remove the top white bar)
FSL improvements
- incremental shader generation and build with header dependencies
- improved error reported by extending line directives, errors now show up at the correct line in source fsl file
- extended matrix column/row access functions for all targets
- vec type padding to match our math lib datatypes

sRGB - all examples should be now more correct when it comes to linear lighting and sRGB
Android Game Extension usage: in one of our AAA game engine projects, we are now using successfully for the first time Android Game Extensions. So we also brought it to The Forge.

Android Game Extension

We redid a lot of the Android development setup to streamline the experience a bit. We still use Visual Studio 2017 because it allows to be more productive compared to other IDEs. Two years ago we went back and forth between IDEs but concluded that the only IDE that we could efficiently integrate into our Jenkins testing setup was Visual Studio.
Please check out the new Readme below and let us know if we missed anything.

Vulkan: moved to KHR ray tracing extensions. Upgraded max spec to Vulkan SDK 1.2.162 and tested ray tracing support on an AMD RX 6700 XT GPU
meshoptimizer - somehow the integration of meshoptimizer "got lost" over time and we just re-integrated it into the resource loader. Here are some numbers that we got

meshoptimizer on various art assets
meshoptimizer improvements

Buildbox

The game engine BuildBox is now using The Forge (click on Image to go to Buildbox website):

Lethis

The game Lethis Path of Progress uses The Forge now (click on image to go to the Steam Store)

The-Forge - Release 1.50 - October 13, 2021 - M²H uses The Forge | Unlinked Multi GPU Support | Central Config.h | glTF Viewer improvements | Scalar High Precision Math

Published by AntoineConffx almost 3 years ago

M²H uses The Forge for Stroke Therapy - M²H is a medical technology company. They developed a physics-based video game therapy solution that is backed by leading edge neuroscience, powered by Artificial Intelligence and controlled by dynamic movement – all working in concert to stimulate vast improvement of cognitive and motor functions for patients with stroke and the aged.
The Forge provides the rendering layer for their application.
Here is a YouTube video on what they do:

Unlinked multiple GPU Support: for professional visualization applications, we now support unlinked multiple GPU.
A new renderer API is added to enumerate available GPUs.
Renderer creation is extended to allow explicit GPU selection using the enumerated GPU list.
Multiple Renderers can be created this way.
The resource loader interface has been extended to support multiple Renderers.
It is initialized with the list of all Renderers created.
To select which Renderer (GPU) resources are loaded on, the NodeIndex used in linked GPU configurations is reused for the same purpose.
Resources cannot be shared on multiple Renderers however, resources must be duplicated explicitly if needed.
To retrieve generated content from one GPU to another (e.g. for presentation), a new resource loader operation is provided to schedule a transfer from a texture to a buffer. The target buffer should be mappable.
This operation requires proper synchronization with the rendering work; a semaphore can be provided to the copy operation for that purpose.
Available with Vulkan and D3D12.
For other APIs, the enumeration API will not create a RendererContext which indicates lack of unlinked multi GPU support.
Config.h: We now have a central config.h file that can be used to configure TF.
- Created config files:

Common_3/OS/Core/Config.h
Common_3/Renderer/RendererConfig.h
Common_3/Renderer/{RenderingAPI}/{RenderingAPI}Config.h

    * Modified PyBuild.py
        * Proper handling of config options.
        * Every config option has --{option-name}/--no-{option-name} flag that uses define/undef directives to enable/disable macros. 
        * Macros are guarded with ifndef/ifdef.
        * Updated Android platform handling
      * Deleted Common_3/Renderer/Compiler.h. It's functionality was moved into Config.h
      * Moved all macro options to config files
      * Renamed USE_{OPTON_NAME} to ENABLE_{OPTION_NAME}
      * Changed some macros to be defined/not defined instead of having values of 0 or 1.
      * Deleted all DISABLE_{OPTION_NAME} macros
      * When detecting raytracing replaced ENABLE_RAYTRACING with RAYTRACING_AVAILABLE. This was done, because not all projects need raytracing even if it is available. RendererConfig.h defines ENABLE_RAYTRACING macro if it is available. So, it can be commented out in singular place instead of searching for it for every platform
      * Removed most of the macro definitions from build systems. Some of the remaining macros are:
        * Target platform macros: NX64, QUEST_VR
        * Arm neon macro ANDROID_ARM_NEON.
        * Windows suppression macros(like _CRT_SECURE_NO_WARNINGS)
        * Macros specific to gainputstatic

glTF viewer improvements:
- sRGB fixes
- IBL support now with prefiltered CCO/public domain cube maps
- TAA support on more platforms and fixes
- Vignette support

glTF Viewer running on Android Galaxy Note 9

glTF Viewer running on iPhone 7

glTF Viewer running on Linux with NVIDIA RTX 2060

glTF Viewer running on Mac Mini M1

glTF Viewer running on PS5

glTF Viewer running on Switch

glTF Viewer running on XBOX One Original

Specialization/Function constants support on Vulkan and Metal only - these constants get baked into the micro-code during pipeline creation time so the performance is identical to using a macro without any of the downsides of macros (too many shader variations increasing the size of the build).

Good read on Specialization constants. Same things apply to function constants on Metal

https://arm-software.github.io/vulkan_best_practice_for_mobile_developers/samples/performance/specialization_constants/specialization_constants_tutorial.html

Declared at global scope using SHADER_CONSTANT macro. Used as any regular variable after declaration

Macro arguments:

#define SHADER_CONSTANT(INDEX, TYPE, NAME, VALUE)

Example usage:

SHADER_CONSTANT(0, uint, gRenderMode, 0);
// Vulkan - layout (constant_id = 0) const uint gRenderMode = 0;
// Metal  - constant uint gRenderMode [[function_constant(0)]];
// Others - const uint gRenderMode = 0;

void main()
{
    // Can be used like regular variables in shader code
    if (gRenderMode == 1)
    {
        // 
    }
}

Resolved GitHub Issues
- #206 - Executing Unit Tests on Mac OS 10.14 gives a Bad Access error
- #209 - way to read texture back from GPU to CPU - this functionality is now in the resource loader
- #210 - memory allocation challenge - not an issue
- #212 - Question: updating partial uniform data on OpenGLES backend - not possible with OpenGL ES 2.0 run-time
- #219 - Question : way to support Vulkan SpecializationInfo? - support is now in the code base see above

The-Forge - Release 1.49 - September 09, 2021 - Quest 2 Support | Apple M1 support ng | MSAA | OpenGL ES 2 Update | PVS Studio

Published by AntoineConffx almost 3 years ago

Quest 2 running 01_Transformations

Quest 2 running 09_ShadowPlayground

iMac with M1 chip running at 3840x2160 resolution
iMac with M1 Running 10_PerPixelProjectedReflections

iPad with M1 chip running with 1024x1366 resolution
iPad with M1 Running 10_PerPixelProjectedReflections

It is astonishing how well the iPad with M1 chip perform.
Due to -what we consider driver bugs- M1 hardware crashes in

minizip ng

Published by AntoineConffx over 3 years ago

This is our biggest update since we started this repository more than three years ago. This update is one of those "what we have learned from the last couple of projects that are using TF" updates and a few more things.

Aura - Dynamic Global Illumination - we developed this system in the 2010 / 2011 time frame. It is hard to believe it is 10 years ago now :-) ... it shipped in Agents of Mayhem at some point and was implemented and used in other games. We are just putting the "base" version without any game specific modifications in our commercial Middleware repository on GitHub. The games that used this system made specific modifications to the code base to align with their art asset and art style.
In today's standards this system still fulfills the requirement of a stable rasterizer based Global Illumination system. It runs efficiently on the original XBOX One, that was the original target platform, but might require art asset modifications in a game level.
It works with an unlimited number of light sources with minimal memory footprint. You can also cache the reflective shadow maps for directional, point and spotlights the same way you currently cache shadow maps. At some point we did a demo running on a second generation integrated Intel GPU with 256 lights that emitted direct and indirect light and had shadow maps in 2011 at GDC? :-)
It is best to integrate that system in a custom game engine that can cache shadow maps in an intelligent way.

Aura - Windows DirectX 12 Geforce 980TI 1080p Driver 466.47

Aura on Windows DX12

Aura - Windows Vulkan Geforce 980TI 1080p Driver 466.47

Aura on Windows Vulkan

Aura - Ubuntu Vulkan Geforce RTX 2080 1080p

Aura on Ubuntu Vulkan

Aura - PS4

Aura on Ubuntu Vulkan

Aura - XBOX One original

Aura on Ubuntu Vulkan

Forge Shader Language (FSL) translator - after struggeling with writing a shader translator now for 1 1/2 years, we restarted from scratch. This time we developed everything in Python, because it is cross-platform. We also picked a really "low-tech keep it simple" approach. The idea is that a small game team can actually maintain the code base and write shaders efficiently. We wanted a shader translator that translates a FSL shader to the native shader language of each of the platforms. This way whatever shader compiler is used on that platform can take over the actual job of compiling the native code.
The reason why we are doing this lies mostly in the unreliability of DXC and SPIR-V in general and also their lack of reliability if it comes to cross-platform translation.

There is a Wiki entry that holds a FSL language primer and general information how this works here:

https://github.com/ConfettiFX/The-Forge/wiki
Run-Time API Switching - we had some sort of run-time API switching in an early version of The Forge. At the time we were not expecting this to be very useful because most game teams do not switch APIs on the fly. In the meantime we found a usage case on Android, where we have to reach a large number of devices. So we came up with a better solution that is more consistent with the overall architecture and works on at least PC and Android platforms.
On Windows PC one can switch between DX12, Vulkan and DX11 if all are supported. On Android one can switch between Vulkan and OpenGL ES 2.0. The later allows us to target a much larger group of devices for business application frameworks. We could extend this architecture to other platforms like consoles easily.
This new API switching required us to change the rendering interfaces. So it is a breaking change to existing implementations but we think it is not much effort to upgrade and the resulting code is easier to read and maintain and overall improves the code base by being more consistent.
Device Reset - This was implemented together with API switching. Windows forces game developers to respond to a crashing device driver by resetting the device. We implemented the functionality already in the last update here on GitHub. This update integrates it better into the OS base layer.
We also verified that the life cycle management for Windows in each application based on the IApp interface works now for device change, device reset and for API switching so that we can cover all cases of losing and recovering the device.

The functions for API switching and device reload and reset are:

void onRequestReload();
void onDeviceLost();
void onAPISwitch();

Variable Rate Shading (VRS) - we implemented VRS in a new unit test 35_VariableRateShading. It is only supported by DirectX 12 on Windows and XBOX Series S / X.
In this demo, we demonstrate two main ways of setting the shading rate:
- Per-tile Shading Rate:
  Generating a shading rate lookup texture on-the-fly. Used for drawing the color palette which makes up the background. The rate decreases the further the pixels are located from the center. We can see artifacts becoming visible at aggressive rates, such as 4X4. There is also a slider in the UI to modify the center of the circle.

Per-tile Shading Rate

Per-draw Shading Rate:
The cubes are drawn by a different shading rate. They are following the Per-draw rate, which can be changed via the dropdown menu in the UI.
By using a combiner that overrides the screen rates, we ensure that cubes are drawn by an independent rate.

The cubes are using per-draw shading rate while the background is using per-tile shading rate.

Notes:
- There is a debug view showing the shading rates and the tiles' size.
- Per-tile method may not be available on certain GPUs even if they support the Per-draw method.
- The tile size is enforced by the GPU and is readable, as shown in the example.
- The shading rates available can vary based on the active GPU.
Multi-Sample Anti-Aliasing (MSAA) - we added a dynamic way of picking MSAA to unit test 9 and the Visibility Buffer example on all platforms.

PC
MSAA

PS4
MSAA

PS5
MSAA

Android & OpenGL ES 2 - the OpenGL ES 2 layer for Android is now more stable and tested and closer to production code. As mentioned above on an Android phone one can switch between Vulkan and OpenGL ES 2 dyanmically if both are supported.
Now Android & OpenGL ES 2 support additionally unit test 17 - Entity Component System Test.
In general we are testing many Android phones at the moment on the low and high end of the spectrum following the two Android projects we are currently working on, which are on both ends of the spectrum.
PVS Studio - we did another manual pass on the code base with PVS Studio -a static code analyzer- to increase code quality.

Published by JenkinsConffx almost 4 years ago

As the year winds slowly down, we finally found time to do another release. First of all, Happy Holidays and a happy new Year!

Happy Holidays and a happy new Year!

Most of us will take off over the Holiday season and spent time with their families. We should be back online in the middle of January 2021.

OpenGL ES 2.0: TF will run on probably several hundred million of mobile devices in the future. It will be the rendering layer of business application frameworks. For this usage case, we added OpenGL ES 2.0 support only for Android. The OpenGL ES 2.0 layer only supports unit tests 1, 5, 12 and 31 at the moment.
Device change / reset: we finally implemented all the code that can deal with device changes, device resets or device removed scenarios on all platforms. The underlying design was always there but it took us 3+ years to finally add the functionality :-)
When you go into any of the *OSBase.* files you can find a snippet of code that looks like this:

if (pApp->mSettings.mResetGraphics) 
	{
		pApp->Unload();
		pApp->Load();
		pApp->mSettings.mResetGraphics = false;
	}

DRED / Breadcrumb support: to be able to better tell what the reason behind a removed device is, we implemented DRED support on PC with DirectX 12 and XBOX. We integrated this into the first functional test 01_Transformations. Here is a screenshot. Look for the "Simulate crash" button:

Image of the Transformations Unit test

Breadcrumb are user defined markers used to pinpoint which command has caused GPU to stall.
In the Breadcrumb unit test, two markers get injected into the command list.
Pressing the crash button would result in a GPU hang.
In this situation, the first marker would be written before the draw command, but the second one would stall for the draw command to finish.
Due to the infinite loop in the shader, the second marker won't be written, and we can reason that the draw command has caused the GPU to hang.
We log the markers' information to verify this.
Check out this link for more info: D3D12 Device Removed Extended Data (DRED)

More Lua Scripting support for all functional tests:
- For the scripted testing of the Unit Tests, this layer provides automated function registration of the UI elements to Lua State.
- Any UI elements added to the GUI will add a function or a pair of function(Getter/Setter) to the Lua state for using them in any script.
  Lua function name resolution will work like this:
- UI Widget "label" name will be included in the function name as follows,
  - For Widget events: label name + "Event Name". e.g., Lua Function name for label - "Press", and event - OnEdited : "PressOnEdited"
  - For Widget modifiers such as ints / floats: "Set" and "Get" function pair will be added as a prefix to label name e.g., "X" variable will have "SetX" and "GetX" pair of functions.
After writing the scripts, you can let the layer know about the scripts using AddTestScripts() function call and run them on any frame by RunTestScript() defined in UIApp class. There are examples of these test scripts in most of the UTs showing how you can also add these scripts to UI and test them on runtime.

Here is how the current Lua support in the functional tests might look like:

Lua support

DX11 refactor: we re-wrote the DX11 run-time a few times. We ended up with the most straighforward version. This version only recently shipped in Hades along with the Vulkan run-time on PC.
YUV support: we have now YUV support for all our Vulkan API platforms PC, Linux, Android and Switch. There is a new functional test for YUV. It runs on all these platforms:

YUV unit test

Audio: we removed the audio functional test. It was the only test that was released unfinished and didn't run on all our platforms. Our customers show love for FMOD ... would make more sense to show an integration of that.
GitHub issues fixed:
- #188 - typo - lowercase L in first DepthStencilClearFlags constant "ClEAR_DEPTH"
- #186 - Ubuntu: Examples fail to build
- #182 - Flickering on master when vsync is off
- #176 - [08_GlftViewer] Application crash on missing resource

Numerous other fixes ...

The-Forge - Release 1.46 - October 1st, 2020 - Supergiant's Hades | Windows Management | AMD FX Stochastic SS Reflection

Published by AntoineConffx about 4 years ago

Supergiant's Hades we are working with Supergiant since 2014. One of the on-going challenges was that their run-time was written in C#. At the beginning of last year, we suggested to help them in building a new cross-platform game engine in C/C++ from scratch with The Forge. The project started in April 2019 and the first version of this new engine launched in May this year. Hades was then released for Microsoft Windows, macOS, and Nintendo Switch on September 17, 2020. The game can run on all platforms supported by The Forge.

Here is a screenshot of Hades running on Switch:

Supergiant Hades

Here is an article by Forbes about Hades being at the top of the Nintendo Switch Charts.
Hades is also a technology showcase for Intel's integrated GPUs on macOS and Windows. The target group of the game seems to often own those GPUs.

Windows management: there is a new functional test named 32_Window that demonstrates windows management on Windows, Linux and macOS.
- The window layout, position, and size are now driven by the client dimensions, meaning that
  the values that the client demands are the exact values the client area will be represented with, regardless of the window style. This allows for much greater flexibility
  and consistency, especially when working with a fullscreen window.
- Multi-monitor support has also been improved significantly, offering smooth consistent transitions between client displays and guaranteeing correct window behavior and data retention. Media layer functionality has been expanded, allowing the client to control mouse positioning, mouse visibility, and mouse visual representation.
- It is now possible to create independent mouse cursors to further customize the application.

Here are the screenshots:

Windows:
Windows Management for Windows

macOS:
Windows Management for macOS

Linux:
Windows Management for Linux

Screen-Space reflections: we renamed the functional test "10_PixelProjectedReflections" to 10_ScreenSpaceReflections. You have now two choices: you can pick either Pixel Projected Reflections or AMD's FX Stochastic Screen Space Reflection. We just made AMD's FX code cross-platform. It runs now on Windows, Linux, macOS, Switch, PS and XBOX.

Here are the screenshots:

Windows final scene:
AMD FX Stochastic Screen Space Reflections

Without denoising:
AMD FX Stochastic Screen Space Reflections before denoise

With denoising:
AMD FX Stochastic Screen Space Reflections before denoise

PS4:
AMD FX Stochastic Screen Space Reflections on PS4

macOS:
AMD FX Stochastic Screen Space Reflections on macOS

Resolved GitHub issues:
- Issue #183 - VERTEX_ATTRIB_RATE_INSTANCE ignored on macOS 10.12, iOS 10.0

The-Forge - Release 1.45 - July 29th, 2020 - TressFX | File System Rewrite

Published by AntoineConffx about 4 years ago

TressFX: we upgraded TressFX a bit and retuned the lighting.

Here are the screenshots:
Black hair

Blond hair

Brown hair

Red hair

File system: our old file system was designed more for tools or Windows applications than for games. It consumed more memory than the whole rendering system and used Windows file methods extensively. That is the opposite of what you want in a game. It took us now several months to correct the mistake and come up with a file system that is tailored towards games. That means that the interface changed substantially. Thanks to all those who pointed this out. Sometimes it takes a couple of iterations to land on a design that is efficient.
If you look at the new interface there are still path related functions in there. They will be removed step-by-step.
Please check out the new file system interface and let us know what you think.
Android Vulkan: validation layer is now supported

The-Forge - Release 1.44 - July 16th, 2020 - Android | Linux

Published by AntoineConffx about 4 years ago

Mobile Devices: DPI scaling is properly handled now so we shouldn't see messed up UI anymore on mobile devices
Android: the following Unit-tests are now included for Android:
- 03_MultiThread
- 04_ExecuteIndirect
- 07_Tesselation
- 10_PixelProjectedReflections
- 12_ZipFileSystem
- 13_UserInterface
- 14_WaveIntrinsics
- 15_Transparency
gamepad support: tested with PS4 controller
sample size reduction
proper closing of apps with the back button
proper handling of vSync
.zip filesystem handling
shader compile #include directive support
overall stability improvements
improved swapchain creation process and proper handling of current frame index
Linux:
- Window management is improved
- Borderless fullscreen is supported
- Implemented full screen toggle (usually alt-enter)
- Cursor position is now correct
- Camera movement with mouse now works properly
- Resources are freed properly

The-Forge - Release 1.43 - May 22nd, 2020 - MTuner | macOS / iOS run-time

Published by AntoineConffx about 4 years ago

Filesystem: it turns out the file system is still confusing and not intuitive. It mixes up several concepts but is not consistent and somehow favors Windows folder naming conventions, that do not exist in most of our target platforms. We did a slight first step with this release. We need to make a deeper change with the next release.
DirectX 11: the DirectX 11 run-time gets a lot of mileage now. For one game it went now successfully through a test center. This release holds a wide range of changes especially for multi-threaded rendering.
MTuner : we are making another attempt on integrating MTuner into the framework. We need it to tune memory usage in some game titles. The current version only reliably supports Windows but we try to extend it to more platforms.
- Integrated Milos Tosic’s MTuner SDK into the Windows 10 runtime of The Forge. Combined with mmgr, this addition will provide the following features:
  - Automatic generation of .MTuner capture file alongside existing .memleaks file.
  - In-depth analysis of the generated file using MTuner’s user-friendly UI app.
  - Clear and efficient highlighting of memory leaks and usage hotspots.
- Support for additional platforms coming soon!

MTuner
MTuner was integrated into the Windows 10 runtime of The Forge following a request for more in-depth memory profiling capabilities by one of the developers we support. It has been adapted to work closely with our framework and its existing memory tracking capabilities to provide a complete picture of a given application’s memory usage.

To use The Forge’s MTuner functionality, simply drag and drop the .MTuner file generated alongside your application’s executable into the MTuner host app, and you can immediately begin analyzing your program’s memory usage. The intuitive interface and exhaustive supply of allocation info contained in a single capture file makes it easy to identify usage patterns and hotspots, as well as tracking memory leaks down to the file and line number. The full documentation of MTuner can be found [here](link: https://milostosic.github.io/MTuner/).

Currently, this feature is only available on Windows 10, but support for additional platforms provided by The Forge is forthcoming.
Here is a screenshot of an example capture done on our first Unit Test, 01_Transformations:
MTuner

Multi-Threading system: especially for the Switch run-time we extended our multi-threading system to support a "preferred" core.
macOS / iOS run-time got another make over. This time we brought the overall architecture a bit closer to the rest of the rendering system and we are also working towards supporting lower end hardware like a 2015 MacBook Air and macOS 10.13.6. Those requirements were based on the Steam Hardware Survey.
there are now functions that help you calcuate the memory usage supported by all APIs: look for caculateMemoryUse / freeMemoryStats
Windows management got a bit more flexible by offering borderless windows and more style attributes
17_EntityComponentSystem runs now better on AMD CPUs ... there were some inefficiencies in the unit test ...

The-Forge - Release 1.42 - April 15th, 2020 - macOS / iOS run-time

Published by AntoineConffx about 4 years ago

Most of us are working from home now due to the Covid-19 outbreak. We are all trying to balance life and work in new ways. Since the last release we made a thorough pass through the macOS / iOS run-time, so that it is easier to make macOS your main development environment for games.
Unit-tests fixes:

Fixed wrong project ordering in XCode
Fixed build-time macOS / iOS warnings.
Fixed 03_Multithread until test not showing the appropriate charts for all threads
Fixed warnings in 06_MaterialPlayground due to wrong GLTF validation
Fixed visual artifacts in 08_GLTFViewer not producing correct normals due to wrong Metal shader.
Added missing projects to macOS / iOS workspace and removed unnecessary ones.
Fixed unit test 14_WaveIntrinsics macOS on AMD iMac. Implemented workaround for AMD driver issue on macOS.

Metal runtime fixes:

Fixed Metal issue handling barriers: scheduled barriers were being ignored, introducing visual artifacts due to read-write race condition.

Closed Issues:

#168 - 04_ExecuteInDirect poor performance on MacPro 2019, AMD Vega II, Catalina 10.15.3
#156 - Benchmark support?

The-Forge - Release 1.41 - March 5th, 2020 - Path Tracing Benchmark | CPU Cacheline alignment | Improved Profiler | D3D12 Memory Allocator

Published by AntoineConffx about 4 years ago

Based on request we are providing a Path Tracing Benchmark in 16_RayTracing. It allows you to compare the performance of three platforms:
- Windows with DirectX 12 DXR
- Windows with Vulkan RTX
- Linux with Vulkan RTX
We believe that every benchmarking tool should be open-source, so that everyone can see what the source code is doing. We will extend this benchmark to the non-public platforms we support to compare the PC performance with console performance.
The benchmark comes with batch files for all three platforms. Each run generates a HTML output file from the microprofiler that is integrated in TF. The default number of iterations is 64 but you can adjust that. There is a Readme file in the 16_RayTracing folder that describes the options.

Windows DirectX 12 DXR, GeForce RTX 2070 Super, 3840x1600, NVIDIA Driver 441.99

Windows DXR output of Ray Tracing Benchmark

Windows Vulkan RTX, GeForce RTX 2070 Super, 3840x1600, NVIDIA Driver 441.99

Windows RTX output of Ray Tracing Benchmark

Linux Vulkan RTX, Geforce RTX 2060, 1920x1080, NVIDIA Driver 435.21

Linux RTX output of Ray Tracing Benchmark

We will adjust the output of the benchmark to what users request.

With this release we also aligned the whole renderer interface better to 64 byte CPU cache lines. We trimmed down all the structs substantially and removed many. This is a breaking change for the renderer interface and a major change to the whole code base.
DirectX 12
- D3D12 Memory Allocator: we are using now AMD's D3D12 memory allocator for DirectX after having used the Vulkan equivalent for more than two years. We also extended it to support Multi-GPU.
- We upgraded to the latest dxgi factory interface in DirectX 12
Microprofiler: because we need the microprofiler to offer the QA department help in reporting performance problems for some of the games that will be shipping with TF (and the benchmark mentioned above), we did another pass on its functionality and ease of use, especially on console platforms. The idea is that QA can quickly and easily store a screenshot or HTML file in a bug report. This is still work in progress and with every shipping game will probably be improved.
Now that GDC 2020 was postponed, we will also postpone our GDC related activities. The user meeting and our GDC talk will be postponed until the next GDC happens. If there is a need we can also do a user meeting in an online conference room or in Discord in a private area. Let us know.
Renamed CustomMiddleware to Custom-Middleware back ...

The-Forge - Release 1.40 - February 20th, 2020 - Resource Loader | glTF as Geometry Container | GDC Talk | User Group Meeting

Published by AntoineConffx about 4 years ago

This release took much longer than expected ... :-)

We are going to give a talk at GDC during the GPU Summit day. It will cover our skydome system Ephemeris 2: GDC 2020 Ephemeris
We will also have a user group meeting during GDC: The Forge User Group
A new resource loader can now stream textures, buffers and additionally geometry (extracted from glTF) asynchronously. We replaced assimp with this loader to save compile time and space on GitHub. We still use assimp for our internal tools. Here are the underlying design principles of the resource loader:
- Generally glTF is just a geometry container for us. We do not apply any of the underlying principles like material or mesh or scene management that it offers because they are not tailored to our needs. The resource loader only loads a glTF file, extract its geometry and stores this data (including hair and ozz animation system data) in a vertex and index buffer stream.
- All texture loading and material loading is the responsibility of the app. Scene partitioning or material support is not used from glTF. Those remain on the App level. Each app has its own lighting and material models and it shouldn't be restricted to the very limiting architecture of glTF
- There is no glTF code in any of the unit tests or app examples with the exception of the glTF viewer. The resource loader loads geometry just with a addResource call as it loads textures and buffers ... it can generate a vertex and index buffer stream with offset values for draw calls or for ExecuteIndirect ...
All model art assets were converted to glTF
libzip was replaced with zip because it is easier to maintain.
Console support: at the end of last year before our three week break, we made the PS4 and Switch run-times ready to ship games (we will see first games shipping this year). We also started on the PS5 and XBOX One Series X support. You need to be an acredited developer to receive the source code for any consoles. We will be asking the console owner for permission before we would provide you with any source code. That means you have to be part of their developer program.
Improved Windows 7 support: one of the games TF is launching with requires Windows 7 support. This means we are now testing the Windows 7 run-time more rigourously and committed fixes with this release
Math library: added missing vec2 functions
Updated copyright statement
Resolved issues on GitHub:
- issue 162 - 13_UserInterface - Crash
- issue 161 - 18_VirtualTexture breaks with dx and vk: only fairly decent cards support virtual textures. We added tracking support in the *.cfg system and throw an error message when the GPU doesn't support the feature.
- issue 124 - Missing KeyKpAdd mapping

The-Forge - Release 1.39 - November 26th - Sparse Virtual Texture Support | Stormland

Published by AntoineConffx almost 5 years ago

The Forge has now support for Sparse Virtual Textures on Windows and Linux with DirectX 12 / Vulkan. Sparse texture (also known as "virtual texture", “tiled texture”, or “mega-texture”) is a technique to load huge size (such as 16k x 16k or more) textures in GPU memory.
It breaks an original texture down into small square or rectangular tiles to load only visible part of them.

The unit test 18_Virtual_Texture is using 7 sparse textures:

Mercury: 8192 x 4096
Venus: 8192 x 4096
Earth: 8192 x 4096
Moon: 16384 x 8192
Mars: 8192 x 4096
Jupiter: 4096 x 2048
Saturn: 4096 x 4096

There is a unit test that shows a solar system where you can approach planets with Sparse Virtual Textures attached and the resolution of the texture will increase when you approach.

Linux 1080p NVIDIA RTX 2060 Vulkan Driver version 435

Sparse Virtual Texture on Linux Vulkan

Windows 10 1080p AMD RX550 DirectX 12 Driver number: Adrenaline software 19.10.1

Sparse Virtual Texture on Windows 10 DirectX 12

Windows 10 1080p NVIDIA 1080 Vulkan Driver number: 418.81

Sparse Virtual Texture on Windows Vulkan

Ephemeris 2 - the game Stormland from Insomniac was released. This game is using a custom version of Ephemeris 2. We worked for more than six months on this project.

Stormland