The Forge Cross-Platform Rendering Framework PC Windows, Steamdeck (native), Ray Tracing, macOS / iOS, Android, XBOX, PS4, PS5, Switch, Quest 2
APACHE-2.0 License
Published by AntoineConffx about 2 months ago
We helped Skydance Interactive to optimize Behemoth last year. Click on the image below to see the announce trailer:
This unit test was based on some of our research into software rasterization and GPU-driven rendering. A particle system completely running in very few compute shaders with one large buffer holding most of the data. Like with all things GPU-Driven, the trick is to execute one compute shader once on one buffer to reduce read / write memory bandwidth. Although this is not new wisdom, you will be surprised how many particle systems get this still wrong ... having compute shaders for each stage of the particle life time or even worse doing most of the particle work on the CPU.
This particle system was demoed last year in a few talks in September on a Samsung S22. Here are the slides:
http://www.conffx.com/WolfgangEngelParticleSystem.pptx
It is meant to be used to implement next-gen Mega Particle systems in which we simulate always 100000th or millions of particles at once instead of the few dozen ones contemporary systems simulate.
This screenshot shows 4 million firefly-like particles, with 10000 lights attached to them and a shadow for the directional light. Those numbers were thought to be not possible on mobile phones before.
Same setting as above but this time also with 8 Shadows from Point Lights additionally.
Same setting as above but this time also with 8 Shadows from Point Lights additionally.
we have the new compute based TVB 2.0 approach now running on all platforms (on Android only S22). You can download slides from the I3D talk from
Published by AntoineConffx 6 months ago
We are giving a talk about our latest Visibility Buffer research on I3D. Here is a short primer what it is about:
The original idea of the Triangle Visibility Buffer is based on an article by [[burns2013]. [schied15] and [schied16] extended what was described in the original article. Christoph Schied implemented a modern version with an early version of OpenGL (supporting MultiDrawIndirect) into The Forge rendering framework in September 2015.
We ported this code to all platforms and simplified and extended it in the following years by adding a triangle filtering stage following [chajdas] and [wihlidal17] and a new way of shading.
Our on-going improvements simplified the approach incrementally and the architecture started to resemble what was described in the original article by [burns2013] again, leveraging the modern tools of the newer graphics APIs.
In contrast to [burns2013], the actual storage of triangles in our implementation of a Visibility Buffer happens due to the triangle removal and draw compaction step with an optimal “massaged” data set.
By having removed overdraw in the Visibility Buffer and Depth Buffer, we run a shading approach that shades everything with one regular draw call. We called the shading stage Forward++ due to its resemblance to forward shading and its usage of a tiled light list for applying many lights. It was a step up from Forward+ that requires numerous draw calls.
We described all this in several talks at game industry conferences, for example on GDCE 2016 [engel16] and during XFest 2018, showing considerable performance gains due to reduced memory bandwidth compared to traditional G-buffer based rendering architectures.
A blog post that was updated over the years for what we call now Triangle Visibility Buffer 1.0 (TVB 1.0) can be found here [engel18].
Over the last years we extended this original idea with a Order-Independent Transparency approach (it is more efficient to sort triangle IDs in a per-pixel linked list compared to storing layers of a G-Buffer), software VRS and then we developed a Visibility Buffer approach that doesn't require draw calls to fill the depth and Visibility Buffer and one that requires much less draw calls in parallel.
This release offers -what we call- an updated Triangle Visibility Buffer 1.0 (TVB 1.0) and a prototype for the Triangle Visibility Buffer 2.0 (TVB 2.0).
The changes to TVB 1.0 are evolutionary. We used to map each mesh to an indirect draw element. This reuqired the use of DrawID to map back to the per-mesh data. When working on a game engine with a very high amount of draw calls, it imposed a limitation on the number of "draws" we could do, due to having only a limited number of bits available in the VB.
Additionally, instancing was implemented using a separate instanced draw for each instanced mesh. We refactored the data flow between the draws and the shade pass.
There is now no reliance on DrawID and instances are handled transparently using the same unified draw. This both simplifies the flow of data and allows us to draw more "instanced" meshes.
Apart from being able to use a very high-number of draw calls, the performance didn't change.
The new TVB 2.0 approach is revolutionary in a sense that it doesn't use draw calls anymore to fill the depth and visibility buffer. There are two compute shader invocations that filter triangles and eventually fill the depth and visibility buffer.
Not using draw calls anymore, makes the whole code base more consistent and less convoluted -compared to TVB 1.0-.
You can find now the new Visibilty Buffer 2 approach in
The-Forge\Examples_3\Visibility_Buffer2
This is still in an early stage of development. We only support a limited number of platforms: Windows D3D12, PS4/5, XBOX, and macOS / iOS.
we cleaned up the whole initRenderer code. Merged GPUConfig into GraphicsConfig and unified naming.
We improved the Metal Validation Support.
Everything related to Art assets is now in the Art folder.
Lots of fixes everywhere.
References:
[burns2013] Christopher A. Burns, Warren A. Hunt, "The Visibility Buffer: A Cache-Friendly Approach to Deferred Shading", 2013, Journal of Computer Graphics Techniques (JCGT) 2:2, Pages 55 - 69.
[schied2015] Christoph Schied, Carsten Dachsbacher, "Deferred Attribute Interpolation for Memory-Efficient Deferred Shading" , Kit Publication Website: http://cg.ivd.kit.edu/publications/2015/dais/DAIS.pdf
[schied16] Christoph Schied, Carsten Dachsbacher, "Deferred Attribute Interpolation Shading", 2016, GPU Pro 7, Pages
[chajdas] Matthaeus Chajdas, GeometryFX, 2016, AMD Developer Website http://gpuopen.com/gaming-product/geometryfx/
[wihlidal17] Graham Wihlidal, "Optimizing the Graphics Pipeline with Compute", 2017, GPU Zen 1, Pages 277--320
[engel16] Wolfgang Engel, "4K Rendering Breakthrough: The Filtered and Culled Visibility Buffer", 2016, GDC Vault: https://www.gdcvault.com/play/1023792/4K-Rendering-Breakthrough-The-Filtered
[engel18] Wolfgang Engel, "Triangle Visibility Buffer", 2018, Wolfgang Engel's Diary of a Graphics Programmer Blog http://diaryofagraphicsprogrammer.blogspot.com/2018/03/triangle-visibility-buffer.html
Published by AntoineConffx 7 months ago
We are sponsoring I3D again. Come by and say hi! We also will be giving a talk on the new development around Triangle Visibility Buffer.
We work on Warzone Mobile since August 2020. The game launched on March 21, 2024.
We removed CPU cluster culling and simplified the animation data usage. Now traingle filtering only takes one dispatch each frame again.
We integrated the Swappy frame pacer into the Android / Vulkan eco system.
we did another pass on the GPUCfg system and now we can generate the vendor Ids and model Ids with a python script to keep the *_gpu.data list easily up to date for each platform.
We removed most of the name comparisons and replaced them with the id comparisons which should speed up parsing time and is more specific.
We added to the number of shadow approaches in that unit test screen-space shadows. These are complementary to regular shadow mapping and add more detail. We also fixed a number of inconsistencies with the other shadow map approaches.
PS5 - Screen-Space Shadows on
PS5 - Screen-Space Shadows off
Nintendo Switch
PS4
Now you can have GPU crash reports on all platforms. We skipped OpenGL ES and DX11 so ...
A simple example of a crash report is this:
2024-04-04 23:44:08 [MainThread ] 09a_HybridRaytracing.cp:1685 ERR| [Breadcrumb] Simulating a GPU crash situation (RAYTRACE SHADOWS)...
2024-04-04 23:44:10 [MainThread ] 09a_HybridRaytracing.cp:2428 INFO| Last rendering step (approx): Raytrace Shadows, crashed frame: 2
We will extend the reporting a bit more over time.
Published by AntoineConffx 8 months ago
We improved Ephemeris again and support it now on more platforms. Updating some of the algorithms used and adding more features.
Now we are supporting PC, XBOX'es, PS4/5, Android, Steamdeck, iOS (requires iPhone 11 or higher (so far not Switch)
Ephemeris on XBOX Series X
Ephemeris on Android
Ephemeris on PS4
Ephemeris on PS5
We changed the graphics interface for cmdBindRenderTargets
// old
DECLARE_RENDERER_FUNCTION(void, cmdBindRenderTargets, Cmd* pCmd, uint32_t renderTargetCount, RenderTarget** ppRenderTargets, RenderTarget* pDepthStencil, const LoadActionsDesc* loadActions, uint32_t* pColorArraySlices, uint32_t* pColorMipSlices, uint32_t depthArraySlice, uint32_t depthMipSlice)
// new
DECLARE_RENDERER_FUNCTION(void, cmdBindRenderTargets, Cmd* pCmd, const BindRenderTargetsDesc* pDesc)
Instead of a long list of parameters we now provide a struct that gives us enough flexibility to pack more functionality in there.
We added Variable Rate Shading to the Visibility Buffer OIT example test 15a. This way we have a better looking test scene with St. Miguel.
VRS allows rendering parts of the render target at different resolution based on the auto-generated VRS map, thus achieving higher performance with minimal quality loss. It is inspired by Michael Drobot's SIGGRAPH 2020 talk: https://docs.google.com/presentation/d/1WlntBELCK47vKyOTYI_h_fZahf6LabxS/edit?usp=drive_link&ouid=108042338473354174059&rtpof=true&sd=true
The key idea behind the software-based approach is to render everything in 4xMS targets and use a stencil buffer as a VRS map. VRS map is automatically generated based on the local image gradients.
It could be used on a way wider range of platforms and devices than the hardware-based approach since the hardware VRS support is broken or not supported on many platforms. Because this software approach utilizes 2x2 tiles we could also achieve higher image quality compared to hardware-based VRS.
Shading rate view based on the color per 2x2 pixel quad:
PC
Debug Output with the original Image on PC
PC
Debug Output with the original Image on PC
Android
Debug Output with the original Image on Android
Android
Debug Output with the original Image on Android
UI description:
Supported platforms:
PS4, PS5, all XBOXes, Nintendo Switch, Android (Galaxy S23 and higher), Windows(Vulkan/DX12), macOS/iOS.
You want to check out those files. They are now dedicated per supported platform. So it is easier for us to differ between different Playstations, XBOX'es, Switches, Android, iOS etc..
The Unlinked Multi GPU example was broken on AMD 7x GPUs with Vulkan. This looks like a bug. We had to disable DCC to make that work.
we track GPU memory now and will extend this to other platforms.
We support now the VK_EXT_ASTC_DECODE_MODE_EXTENSION_NAME extension
Various bug fixes to make this more stable. Still alpha ... will crash.
Published by AntoineConffx 9 months ago
Our last release was in October 2022. We were so busy that we lost track of time. In March 2023 we planned to make the next release. We started testing and fixing and improving code up until today. The amount of improvements coming back from the -most of the time- 8 - 10 projects we are working on where so many, it was hard to integrate all this, test it and then maintain it. To a certain degree our business has higher priority than making GitHub releases but we realize that letting a lot of time pass makes it substantially harder for us to get the whole code base back in shape, even with a company size of nearly 40 graphics programmers. So we cut down functional or unit tests, so that we have less variables. We also restructured large parts of our code base so that it is easier to maintain. One of the constant maintenance challenges were the macOS / iOS run-time (More about that below).
We invested a lot in our testing environment. We have more consoles now for testing and we also have a much needed screenshot testing system. We outsource testing to external service providers more. We removed Linux as a stand-alone target but the native Steamdeck support should make up for this.
We tried to be conservative about increasing API versions because we know on many platforms our target group will use older OS or API implementations. Nevertheless we were more adventurous this year then before. So we bumped up with a larger step than in previous years.
Our next release is planned for in about four weeks time. We still have work to do to bring up a few source code parts but now the increments are much smaller.
In the meantime some of the games we worked on, or are still working on, shipped:
Forza Motorsport has launched in the meantime:
Starfield has launched:
No Man Sky has launched on macOS:
This Unit test represents software-based variable rate shading (VRS) technique that allows rendering parts of the render target at different resolution based on the auto-generated VRS map, thus achieving higher performance with minimal quality loss. It is inspired by Michael Drobot's SIGGRAPH 2020 talk: https://docs.google.com/presentation/d/1WlntBELCK47vKyOTYI_h_fZahf6LabxS/edit?usp=drive_link&ouid=108042338473354174059&rtpof=true&sd=true
PC Windows (2560x1080):
Switch (1280x720):
XBOX One S (1080p):
PS4 Pro (3840x2160):
The key idea behind the software-based approach is to render everything in 4xMS targets and use a stencil buffer as a VRS map. The VRS map is automatically generated based on the local image gradients.
The advantage of this approach is that it runs on a wider range of platforms and devices than the hardware-based approach since the hardware VRS support is broken or not supported on many platforms. Because this software approach utilizes 2x2 tiles we can also achieve higher image quality compared to hardware-based VRS.
Shading rate view based on the color per 2x2 pixel quad:
UI description:
Supported platforms:
PS4, PS5, all XBOXes, Nintendo Switch, Android (Galaxy S23 and higher), Windows(Vulkan/DX12).
Implemented on MacOS/IOS, but doesn’t give expected performance benefits due to the issue with stencil testing on that platform
To enable re-compilation of shaders during run-time we implemented a cross-platform shader server that allows to recompile shaders by pressing CTRL-S or a button in a dedicated menu.
You can find the documentation in the Wiki in the FSL section.
When working remotely, on mobile or console it can cumbersome to control the development UI.
We added a remote control application in Common_3\Tools\UIRemoteControl which allows control of all UI elements on all platforms.
It works as follows:
This is alpha software so expect it to crash ...
This extension allows developers to query for additional information on GPU faults which may have caused device loss, and to generate binary crash dumps.
We switched to Ray Queries for the common Ray Tracing APIs on all the platforms we support. The current Ray Tracing APIs increase the amount of memory necessary substantially, decrease performance and can't add much visually because the whole game has to run with lower resolution, lower texture resolution and lower graphics quality (to make up for this, upscalers were introduced that add new issues to the final image).
Because Ray Tracing became a Marketing term valuable to GPU manufacturers, some game developers support now Ray Tracing to help increase hardware sales. So we are going with the flow here by offering those APIs.
macOS (1440x810)
PS5 (3840x2160)
Windows 10 (2560x1080)
XBOX One Series X (1920x1080)
iPhone 11 (Model A2111) at resolution 896x414
We do not have a denoiser for the Path Tracer.
This is a cross-platform system that can track GPU capabilities on all platforms and switch on and off features of a game for different platforms. To read a lot more about this follow the link below.
We think the Metal API is a well balanced Graphics API that walks the path between low-level and high-level very well. We ran into one general problem with the Metal API for both platforms. It is hard to maintain the code base. There is an architectural problem that was probably introduced due to lack in experience in shipping games.
In essence what Apple decided to do is have calls like this:
https://developer.apple.com/documentation/swift/marking-api-availability-in-objective-c
Anything a hardware vendor describes as available and working might not be working with the next upgrade of the operating system, hardware or just the API or XCode.
If you have a few hundred of those macros in your code, it becomes a lottery what works and what not on a variety of hardware. On some hardware one is broken, on the other hardware something else.
So there are two ways to deal with this: for every @available macro you start adding a #define to switch off or replace that code based on the underlying hardware and software platform. You would have to manually track if what the macro says is true on a wide range of platforms with different outcome.
So for example on macOS 10.13 running on a certain Macbook Pro (I make this up) with an Intel GPU it is broken but then a very similar Macbook Pro that has additionally a dedicated GPU actually runs it. Now you have to track what "class of Macbook Pro" we are talking about and if the Macbook Pro in question has an Intel or an AMD GPU.
We track all this data already so that is not a problem. We know exactly what piece of hardware we are looking at (see above GPU Config system).
The problem is that we have to guard every @available macro with some of this. From a QA standpoint that generates an explosion of QA requests. To cut down on the number of variables we decided to focus only on calls that are available in two different macOS and two different iOS versions. Here is the code in iOperatingSystem.h
// Support fixed set of OS versions in order to minimize maintenance efforts
// Two runtimes. Using IOS in the name as iOS versions are easier to remember
// Try to match the macOS version with the relevant features in iOS version
#define IOS14_API API_AVAILABLE(macos(11.0), ios(14.1))
#define IOS14_RUNTIME @available(macOS 11.0, iOS 14.1, *)
#define IOS17_API API_AVAILABLE(macos(14.0), ios(17.0))
#define IOS17_RUNTIME @available(macOS 14.0, iOS 17.0, *)
We were one of the big proponents of the Dynamic Rendering extension. As game developers we took over part of the driver development by adopting Vulkan as a Graphics API. The cost of game production rose substantially due to that because our QA efforts had to be increased to deal with an API that is more lower level.
One of the interesting findings that were made by many who adopted Vulkan in games is that a Vulkan run-time in most cases runs slower on the GPU compared to the DirectX 11 run-time. It is very hard to optimize a Vulkan run-time to run as fast as a DirectX 11 run-time on the GPU. This is due to parts of the responsibilities having shifted from the device driver writer to the game developer. The main advantage of using Vulkan is the lower CPU overhead. While older DirectX 11 drivers were so inefficiently programmed that they were bringing down high-end PC CPUs and did not allow anymore to run two GPUs in SLI / Crossfire (hence the practical death of multi-GPU support for games), Vulkan has a substantially lower CPU overhead.
So that being said the one thing that shouldn't have been moved from the driver into the game developer space is the render pass concept. It is so close to the hardware that device driver writers can deal much better with it then game developers. In our measurements of tiled hardware renderers, using tiles never had a positive effect. We can imagine a device driver writer with direct access to the hardware can make tiles run much faster than we can.
This is why we embrace the dynamic rendering extension.
glTF is an art exchange format but not suitable to load game assets. There are mostly two reasons for this: it doesn't make sense to load one glTF file for each "model" and it also doesn't make sense to load one glTF file for a scene because that would not allow streaming. Generally the way we load art assets is in one large zipped file with one "fopen" call. To not use any other OS calls we load from this large file all the assets by looking into a look-up table, find the address in memory, run a pointer there and then copy the data into system memory (that was a simplified view but should suffice for this purpose). To allow streaming depending on the type of game, we pre-package art assets in meaningful ways so when we load on demand they are as expected.
glTF does not align with this concept or the fact that we compress data to fit into our internal caches. In other words it doesn't make sense to use glTF during run-time of a game.
The Asset Pipeline now loads and converts and optimizes glTF data into our internal format offline. This can be adjusted to the needs of any type of game and represents currently a proof-of-concept.
Today there is no point in using Sponza as a test scene anymore. One can argue that even St. Miguel does not fullfil that purpose. Nevertheless it is currently the best we got. We removed all Sponza art assets and replaced them with St. Miguel.
We updated the Wiki documentation. Check it out. We know it could be more ...
Published by AntoineConffx 9 months ago
The Starfield Official Gameplay Reveal Trailer is out. It always brings us pleasure to see The Forge running in AAA games like this:
We added The Forge to the Creation Engine in 2019.
The Forge made an appearance during the Apple developer conference 2022. We added it to the game "No Man's Sky" from Hello Games to bring this game up on macOS / iOS. For the Youtube video click on the image below and jump to 1:22:40
We switched our Linux OS to Manjaro to have an easier upgrade path to the Steamdeck. Please note the changed Linux requirements below.
Shader byte code can now be generated offline.
Over the last few projects we had always challenges with EASTL. So over the last 9 months we slowly removed it and replaced it by new C language based containers that prefer stack allocations over heap allocations.
There is a new unit test that helps us to test the new libraries.
For string management:
bstrlib
For dynamic arrays and hash tables:
stb_ds.h
There is a new unit test to make sure those new containers are tested. It is called 36_AlgorithmsAndContainers
We changed the App life cycle: modern APIs have so many ways to reset the driver or reload assets, so we made a more flexible "reload" mechanism that generalizes all the special cases we had in there before.
New Animation test that unifies most of the former animation tests into one. This way we can save some testing time in our Jenkins setup.
We added a new unit test called 38_AmbientOcclusion_GTAO. It implements the paper "Practical Real-Time Strategies for Accurate Indirect Occlusion" by Jorge Jimenez et. all.
macOS
PC
PS4
PS5
Switch
XBOX
We improved the gradient calculation in the Visibility Buffer. Thanks to Stephen Hill @self_shadow who brought this to our attention.
We reorganized the whole TF directory structure to allow development in more areas. Here is an image representing the new structure:
What is still missing is the "Render Abstraction Layer", "Scene Loader" and we have to populate the "Game Layer" more.
We upgraded to ImGUI 1.88 to get access to the docking feature. In the process we improved the ImGUI integration substantially.
We started a blog for The Forge at The-Forge-Blog. We have no idea where we can find the time to write blog posts ... let's see what is happening ...
Retired Unit/Functional Tests:
Published by AntoineConffx 9 months ago
We are always looking for more graphics / engine programmers. We are also specifically looking for a consultant who can help us to scale up our hardware testing environment.
The following list of changes is not fully representative of all the improvements we made, so it is just a selection:
for this.
The test contains two projects:
How to use it: While the Main project is running open 19_CodeHotReload_Game.cpp and perform some change, there are lines marked with TRY_CODE_RELOAD
to make easy changes. Once the file is saved, you can rebuild the project and see the changes happen automatically.
Note: In this implementation we can't call any functions from The Forge from the HotReloadable project (19a_CodeHotReload_Game), this is because we are compiling OS and Renderer as static libraries and linking them directly to the exe. Ideally these projects should be compiled as dynamic libraries in order to expose their functionality to the exe and hot reloadable dll. The reason we didn't implement it in this way is because all our other projects are already setup to use static libraries.
Linux 1080p resolution
macOS 3200x1760 resolution
PS4 1080p resolution
PS5 4k resolution
Windows 10 1080p resolution
XBOX One (original) 1080p resolution
https://twitter.com/Vuthric/status/1286796950214307840
A detailed description can be found here: https://realtimevfx.com/t/smoke-lighting-and-texture-re-usability-in-skull-bones/5339
In this repository is a "dlut.blend" file that contains a minimal volumetric render setup. In order to generate DLUT image do the following steps:
Resulting DLUT image should look like this:
The example program running on Android:
Window Management - all the platforms that support the concept of having a windowed application have now a base file named {Platform}Window.cpp. There is now a common UI element that offers -if supported- multi-monitor support and various window settings. There are also LUA scripts that test the functionality in our Jenkins setup.
Android Vulkan Validation layers: we added the validation layer from Khronos GitHub repo as they have stopped shipping the layer in the NDK.
Android Vulkan Validation Layers
You can find them in ThirdParty/OpenSource/AndroidVulkanValidationLayers
This library is the stepping stone of utilizing more CPU instrinsics on various platforms. You can see its results in the screenshots above, showing the name of the CPU, the supported instruction set. We also show now the GPU name and the driver version that the GPU uses.
Upgraded Vulkan and DX Allocators: following the updates to these open-source libraries on GitHub we upgraded our code base accordingly.
macOS / iOS - while working with TF on various projects, we bring back improvements and lessons learned from those projects. You will find numerous macOS / iOS improvements in this release.
For one of the business applications we worked on, we needed double precision Math. We extended the math library now accordingly with support.
We also improved the input system with HID support, which is an on-going effort. So better controller support on more platforms ...
Retired unit test: we are going to retire many unit tests now because our automated testing cycle takes too long and heats up the "engine" room (see above passage on us looking for an consultant to scale up our testing environment). Today we retire:
Resolved GitHub Issues:
Happy Holidays! 🎄🎅🔥🎁🧨
We wish you and your loved ones all the best for the Holidays and a Happy New Year 2022!
This update is again a mixture of things we learned while integrating The Forge and feedback and contributions from our users. Thanks for all the support!
In one of the next updates we will remove EASTL and offer dedicated containers compatible with C99. Over time EASTL was a huge productivity burner. The inefficient memory access patterns hugged too much CPU time in games where we integrated TF and we always had to go back and fix those later manually.
We know this is a breaking change but considering that STL was a good idea on CPUs 20+ years ago, we would like to align more with what modern CPUs are expecting.
Now our build times are much better and the overall system runs faster:
CPU: Intel i7-7700k
GPU: AMD Radeon RX570
Old ECS
Debug
Single Threaded: 90.0ms
Multi Threaded 29.0ms
Release:
Single Threaded: 5.7ms
Multi Threaded: 2.3ms
flecs
Debug
Single Threaded: 23.0ms
Multi Threaded 6.8ms
Release
Single Threaded 1.7ms
Multi Threaded 0.9ms
Descriptor Management improvements - we changed the rendering interface for all platforms - cmdBindPushConstants now takes an index instead of a name, we also allow partial updates of array descriptors
Borderless window - there are improvements to borderless windows support
FSL improvements
sRGB - all examples should be now more correct when it comes to linear lighting and sRGB
Android Game Extension usage: in one of our AAA game engine projects, we are now using successfully for the first time Android Game Extensions. So we also brought it to The Forge.
We redid a lot of the Android development setup to streamline the experience a bit. We still use Visual Studio 2017 because it allows to be more productive compared to other IDEs. Two years ago we went back and forth between IDEs but concluded that the only IDE that we could efficiently integrate into our Jenkins testing setup was Visual Studio.
Please check out the new Readme below and let us know if we missed anything.
Vulkan: moved to KHR ray tracing extensions. Upgraded max spec to Vulkan SDK 1.2.162 and tested ray tracing support on an AMD RX 6700 XT GPU
meshoptimizer - somehow the integration of meshoptimizer "got lost" over time and we just re-integrated it into the resource loader. Here are some numbers that we got
meshoptimizer on various art assets
The game engine BuildBox is now using The Forge (click on Image to go to Buildbox website):
The game Lethis Path of Progress uses The Forge now (click on image to go to the Steam Store)
Published by AntoineConffx almost 3 years ago
Unlinked multiple GPU Support: for professional visualization applications, we now support unlinked multiple GPU.
A new renderer API is added to enumerate available GPUs.
Renderer creation is extended to allow explicit GPU selection using the enumerated GPU list.
Multiple Renderers can be created this way.
The resource loader interface has been extended to support multiple Renderers.
It is initialized with the list of all Renderers created.
To select which Renderer (GPU) resources are loaded on, the NodeIndex used in linked GPU configurations is reused for the same purpose.
Resources cannot be shared on multiple Renderers however, resources must be duplicated explicitly if needed.
To retrieve generated content from one GPU to another (e.g. for presentation), a new resource loader operation is provided to schedule a transfer from a texture to a buffer. The target buffer should be mappable.
This operation requires proper synchronization with the rendering work; a semaphore can be provided to the copy operation for that purpose.
Available with Vulkan and D3D12.
For other APIs, the enumeration API will not create a RendererContext which indicates lack of unlinked multi GPU support.
Config.h: We now have a central config.h file that can be used to configure TF.
Common_3/OS/Core/Config.h
Common_3/Renderer/RendererConfig.h
Common_3/Renderer/{RenderingAPI}/{RenderingAPI}Config.h
* Modified PyBuild.py
* Proper handling of config options.
* Every config option has --{option-name}/--no-{option-name} flag that uses define/undef directives to enable/disable macros.
* Macros are guarded with ifndef/ifdef.
* Updated Android platform handling
* Deleted Common_3/Renderer/Compiler.h. It's functionality was moved into Config.h
* Moved all macro options to config files
* Renamed USE_{OPTON_NAME} to ENABLE_{OPTION_NAME}
* Changed some macros to be defined/not defined instead of having values of 0 or 1.
* Deleted all DISABLE_{OPTION_NAME} macros
* When detecting raytracing replaced ENABLE_RAYTRACING with RAYTRACING_AVAILABLE. This was done, because not all projects need raytracing even if it is available. RendererConfig.h defines ENABLE_RAYTRACING macro if it is available. So, it can be commented out in singular place instead of searching for it for every platform
* Removed most of the macro definitions from build systems. Some of the remaining macros are:
* Target platform macros: NX64, QUEST_VR
* Arm neon macro ANDROID_ARM_NEON.
* Windows suppression macros(like _CRT_SECURE_NO_WARNINGS)
* Macros specific to gainputstatic
glTF Viewer running on Android Galaxy Note 9
glTF Viewer running on iPhone 7
glTF Viewer running on Linux with NVIDIA RTX 2060
glTF Viewer running on Mac Mini M1
glTF Viewer running on PS5
glTF Viewer running on Switch
glTF Viewer running on XBOX One Original
Good read on Specialization constants. Same things apply to function constants on Metal
Declared at global scope using SHADER_CONSTANT macro. Used as any regular variable after declaration
Macro arguments:
#define SHADER_CONSTANT(INDEX, TYPE, NAME, VALUE)
Example usage:
SHADER_CONSTANT(0, uint, gRenderMode, 0);
// Vulkan - layout (constant_id = 0) const uint gRenderMode = 0;
// Metal - constant uint gRenderMode [[function_constant(0)]];
// Others - const uint gRenderMode = 0;
void main()
{
// Can be used like regular variables in shader code
if (gRenderMode == 1)
{
//
}
}
Published by AntoineConffx almost 3 years ago
Quest 2 running 01_Transformations
Quest 2 running 09_ShadowPlayground
iMac with M1 chip running at 3840x2160 resolution
iPad with M1 chip running with 1024x1366 resolution
It is astonishing how well the iPad with M1 chip perform.
Due to -what we consider driver bugs- M1 hardware crashes in
Published by AntoineConffx over 3 years ago
This is our biggest update since we started this repository more than three years ago. This update is one of those "what we have learned from the last couple of projects that are using TF" updates and a few more things.
Aura - Windows DirectX 12 Geforce 980TI 1080p Driver 466.47
Aura - Windows Vulkan Geforce 980TI 1080p Driver 466.47
Aura - Ubuntu Vulkan Geforce RTX 2080 1080p
Aura - PS4
Aura - XBOX One original
Forge Shader Language (FSL) translator - after struggeling with writing a shader translator now for 1 1/2 years, we restarted from scratch. This time we developed everything in Python, because it is cross-platform. We also picked a really "low-tech keep it simple" approach. The idea is that a small game team can actually maintain the code base and write shaders efficiently. We wanted a shader translator that translates a FSL shader to the native shader language of each of the platforms. This way whatever shader compiler is used on that platform can take over the actual job of compiling the native code.
The reason why we are doing this lies mostly in the unreliability of DXC and SPIR-V in general and also their lack of reliability if it comes to cross-platform translation.
There is a Wiki entry that holds a FSL language primer and general information how this works here:
Run-Time API Switching - we had some sort of run-time API switching in an early version of The Forge. At the time we were not expecting this to be very useful because most game teams do not switch APIs on the fly. In the meantime we found a usage case on Android, where we have to reach a large number of devices. So we came up with a better solution that is more consistent with the overall architecture and works on at least PC and Android platforms.
On Windows PC one can switch between DX12, Vulkan and DX11 if all are supported. On Android one can switch between Vulkan and OpenGL ES 2.0. The later allows us to target a much larger group of devices for business application frameworks. We could extend this architecture to other platforms like consoles easily.
This new API switching required us to change the rendering interfaces. So it is a breaking change to existing implementations but we think it is not much effort to upgrade and the resulting code is easier to read and maintain and overall improves the code base by being more consistent.
Device Reset - This was implemented together with API switching. Windows forces game developers to respond to a crashing device driver by resetting the device. We implemented the functionality already in the last update here on GitHub. This update integrates it better into the OS base layer.
We also verified that the life cycle management for Windows in each application based on the IApp interface works now for device change, device reset and for API switching so that we can cover all cases of losing and recovering the device.
The functions for API switching and device reload and reset are:
void onRequestReload();
void onDeviceLost();
void onAPISwitch();
Variable Rate Shading (VRS) - we implemented VRS in a new unit test 35_VariableRateShading. It is only supported by DirectX 12 on Windows and XBOX Series S / X.
In this demo, we demonstrate two main ways of setting the shading rate:
The cubes are using per-draw shading rate while the background is using per-tile shading rate.
Notes:
Multi-Sample Anti-Aliasing (MSAA) - we added a dynamic way of picking MSAA to unit test 9 and the Visibility Buffer example on all platforms.
PC
PS4
PS5
Android & OpenGL ES 2 - the OpenGL ES 2 layer for Android is now more stable and tested and closer to production code. As mentioned above on an Android phone one can switch between Vulkan and OpenGL ES 2 dyanmically if both are supported.
Now Android & OpenGL ES 2 support additionally unit test 17 - Entity Component System Test.
In general we are testing many Android phones at the moment on the low and high end of the spectrum following the two Android projects we are currently working on, which are on both ends of the spectrum.
PVS Studio - we did another manual pass on the code base with PVS Studio -a static code analyzer- to increase code quality.
Published by JenkinsConffx almost 4 years ago
As the year winds slowly down, we finally found time to do another release. First of all, Happy Holidays and a happy new Year!
Most of us will take off over the Holiday season and spent time with their families. We should be back online in the middle of January 2021.
*OSBase.*
files you can find a snippet of code that looks like this:if (pApp->mSettings.mResetGraphics)
{
pApp->Unload();
pApp->Load();
pApp->mSettings.mResetGraphics = false;
}
Breadcrumb are user defined markers used to pinpoint which command has caused GPU to stall.
In the Breadcrumb unit test, two markers get injected into the command list.
Pressing the crash button would result in a GPU hang.
In this situation, the first marker would be written before the draw command, but the second one would stall for the draw command to finish.
Due to the infinite loop in the shader, the second marker won't be written, and we can reason that the draw command has caused the GPU to hang.
We log the markers' information to verify this.
Check out this link for more info: D3D12 Device Removed Extended Data (DRED)
Here is how the current Lua support in the functional tests might look like:
DX11 refactor: we re-wrote the DX11 run-time a few times. We ended up with the most straighforward version. This version only recently shipped in Hades along with the Vulkan run-time on PC.
YUV support: we have now YUV support for all our Vulkan API platforms PC, Linux, Android and Switch. There is a new functional test for YUV. It runs on all these platforms:
Audio: we removed the audio functional test. It was the only test that was released unfinished and didn't run on all our platforms. Our customers show love for FMOD ... would make more sense to show an integration of that.
GitHub issues fixed:
Numerous other fixes ...
Published by AntoineConffx about 4 years ago
Here is a screenshot of Hades running on Switch:
Here is an article by Forbes about Hades being at the top of the Nintendo Switch Charts.
Hades is also a technology showcase for Intel's integrated GPUs on macOS and Windows. The target group of the game seems to often own those GPUs.
Here are the screenshots:
Windows:
macOS:
Linux:
Here are the screenshots:
Windows final scene:
Without denoising:
With denoising:
PS4:
macOS:
Published by AntoineConffx about 4 years ago
Here are the screenshots:
File system: our old file system was designed more for tools or Windows applications than for games. It consumed more memory than the whole rendering system and used Windows file methods extensively. That is the opposite of what you want in a game. It took us now several months to correct the mistake and come up with a file system that is tailored towards games. That means that the interface changed substantially. Thanks to all those who pointed this out. Sometimes it takes a couple of iterations to land on a design that is efficient.
If you look at the new interface there are still path related functions in there. They will be removed step-by-step.
Please check out the new file system interface and let us know what you think.
Android Vulkan: validation layer is now supported
Published by AntoineConffx about 4 years ago
Mobile Devices: DPI scaling is properly handled now so we shouldn't see messed up UI anymore on mobile devices
Android: the following Unit-tests are now included for Android:
gamepad support: tested with PS4 controller
sample size reduction
proper closing of apps with the back button
proper handling of vSync
.zip filesystem handling
shader compile #include directive support
overall stability improvements
improved swapchain creation process and proper handling of current frame index
Linux:
Published by AntoineConffx about 4 years ago
MTuner
MTuner was integrated into the Windows 10 runtime of The Forge following a request for more in-depth memory profiling capabilities by one of the developers we support. It has been adapted to work closely with our framework and its existing memory tracking capabilities to provide a complete picture of a given application’s memory usage.
To use The Forge’s MTuner functionality, simply drag and drop the .MTuner file generated alongside your application’s executable into the MTuner host app, and you can immediately begin analyzing your program’s memory usage. The intuitive interface and exhaustive supply of allocation info contained in a single capture file makes it easy to identify usage patterns and hotspots, as well as tracking memory leaks down to the file and line number. The full documentation of MTuner can be found [here](link: https://milostosic.github.io/MTuner/).
Currently, this feature is only available on Windows 10, but support for additional platforms provided by The Forge is forthcoming.
Here is a screenshot of an example capture done on our first Unit Test, 01_Transformations:
Published by AntoineConffx about 4 years ago
Most of us are working from home now due to the Covid-19 outbreak. We are all trying to balance life and work in new ways. Since the last release we made a thorough pass through the macOS / iOS run-time, so that it is easier to make macOS your main development environment for games.
Unit-tests fixes:
Metal runtime fixes:
Closed Issues:
Published by AntoineConffx about 4 years ago
Based on request we are providing a Path Tracing Benchmark in 16_RayTracing. It allows you to compare the performance of three platforms:
We believe that every benchmarking tool should be open-source, so that everyone can see what the source code is doing. We will extend this benchmark to the non-public platforms we support to compare the PC performance with console performance.
The benchmark comes with batch files for all three platforms. Each run generates a HTML output file from the microprofiler that is integrated in TF. The default number of iterations is 64 but you can adjust that. There is a Readme file in the 16_RayTracing folder that describes the options.
Windows DirectX 12 DXR, GeForce RTX 2070 Super, 3840x1600, NVIDIA Driver 441.99
Windows Vulkan RTX, GeForce RTX 2070 Super, 3840x1600, NVIDIA Driver 441.99
Linux Vulkan RTX, Geforce RTX 2060, 1920x1080, NVIDIA Driver 435.21
We will adjust the output of the benchmark to what users request.
Published by AntoineConffx about 4 years ago
This release took much longer than expected ... :-)
Published by AntoineConffx almost 5 years ago
The Forge has now support for Sparse Virtual Textures on Windows and Linux with DirectX 12 / Vulkan. Sparse texture (also known as "virtual texture", “tiled texture”, or “mega-texture”) is a technique to load huge size (such as 16k x 16k or more) textures in GPU memory.
It breaks an original texture down into small square or rectangular tiles to load only visible part of them.
The unit test 18_Virtual_Texture is using 7 sparse textures:
There is a unit test that shows a solar system where you can approach planets with Sparse Virtual Textures attached and the resolution of the texture will increase when you approach.
Linux 1080p NVIDIA RTX 2060 Vulkan Driver version 435
Windows 10 1080p AMD RX550 DirectX 12 Driver number: Adrenaline software 19.10.1
Windows 10 1080p NVIDIA 1080 Vulkan Driver number: 418.81
Ephemeris 2 - the game Stormland from Insomniac was released. This game is using a custom version of Ephemeris 2. We worked for more than six months on this project.
Head over to Custom Middleware to check out the source code.