taichi

Productive, portable, and performant GPU programming in Python.

APACHE-2.0 License

Downloads
32.8K
Stars
24.7K
Committers
222

Bot releases are visible (Hide)

taichi - v1.7.1 Latest Release

Published by github-actions[bot] 6 months ago

Highlights:

  • Bug fixes
    • Fix CFG aliasing error with matrix of matrix (#8445) (by Zhanlue Yang)
  • Documentation
    • Update offset.md (#8470) (by Kenshi Takayama)
    • Update math_module.md (#8471) (by Kenshi Takayama)
    • Update accelerate_pytorch.md | Fix typo in recap: Eeasy -> Easy (#8475) (by Aryan Garg)
  • Miscellaneous
    • Bump version to 1.7.1 (by Haidong Lan)
    • Bump taichi version to v1.8.0 (#8458) (by Zhanlue Yang)

Full changelog:

  • [Misc] Bump version to 1.7.1 (by Haidong Lan)
  • [bug] Fix abs on unsigned types (#8476) (by Lin Jiang)
  • [Doc] Update offset.md (#8470) (by Kenshi Takayama)
  • [Doc] Update math_module.md (#8471) (by Kenshi Takayama)
  • [Doc] Update accelerate_pytorch.md | Fix typo in recap: Eeasy -> Easy (#8475) (by Aryan Garg)
  • [Misc] Bump taichi version to v1.8.0 (#8458) (by Zhanlue Yang)
  • [lang] Warn about non-contiguous gradient tensors (#8450) (by Bob Cao)
  • [autodiff] Fix the type of cmp statements in autodiff (#8452) (by Lin Jiang)
  • [Bug] Fix CFG aliasing error with matrix of matrix (#8445) (by Zhanlue Yang)
  • [misc] Add flag to disable taichi header print (#8413) (by Chaoming Wang)
taichi - v1.7.0

Published by github-actions[bot] 11 months ago

1. New features

1.1 Real Function

We are excited to announce the stabilization of the Real Function feature in Taichi Lang v1.7.0. Initially introduced as an experimental feature in v1.0.0, it has now matured with enhanced capabilities and usability.

Key Updates

  • Decorator Change: The Real Function now uses @ti.real_func. The previous decorator, @ti.experimental.real_func, is deprecated.
  • Performance Improvements: Real Functions, unlike Taichi inline functions (@ti.func), are compiled as separate entities, akin to CUDA's device functions. This separation allows for recursive runtime calls and significantly faster compilation. For instance, the Cornell box example's compilation time is reduced from 2.34s to 1.01s on an i9-11900K when switching from inline to real functions.
  • Enhanced Functionality: Real Functions support multiple return statements, offering greater flexibility in coding.

Limitations

  • Backend Support: Real Functions are currently only compatible with LLVM-based backends, including CPU and CUDA.
  • Parallel Loops: Writing parallel loops within Real Functions is not supported. However, if called within a parallel loop in a kernel, the Real Function will be parallelized accordingly.

Important Note on Usage: Ensure all arguments and return values in Real Functions are explicitly type-hinted.

Usage Example

The following example demonstrates the recursive capability of Real Functions. The sum_func Real Function is used to calculate the sum of numbers from 1 to n, showcasing its ability to handle multiple return statements and variable recursion depths.

@ti.real_func
def sum_func(n: ti.i32) -> ti.i32:
    if n == 0:
        return 0
    return sum_func(n - 1) + n

@ti.kernel
def sum(n: ti.i32) -> ti.i32:
    return sum_func(n)

print(sum(100))  # 5050

You can find more examples of the real function in the repository.

1.2 Enhancements in Kernel Arguments and Return Values

Support for Multiple Return Values in Taichi Kernel:

In this update, we've introduced the capability to return multiple values from a Taichi kernel. This can be achieved by specifying a tuple as the return type. You can directly use (ti.f32, s0) as the type hint or write the type hint in Python manner like typing.Tuple[ti.f32, s0] or for Python 3.9 and above, tuple[ti.f32, s0] . The following example illustrates this new feature:

s0 = ti.types.struct(a=ti.math.vec3, b=ti.i16)

@ti.real_func
def foo() -> (ti.f32, s0):
    return 1, s0(a=ti.math.vec3([100, 0.5, 3]), b=1)

@ti.kernel
def bar() -> (ti.f32, s0):
    return foo()
    
ret1, ret2 = bar()
print(ret1)  # 1.0
print(ret2)  # {'a': [100.0, 0.5, 3.0], 'b': 1}

Removal of Size Limit on Kernel Arguments and Return Values:

We have eliminated the size restrictions on kernel arguments and return values. However, it's crucial to remember that keeping these small is advisable. Large argument or return value sizes can lead to substantially longer compile times. While we support larger sizes, we haven't thoroughly tested arguments and return values exceeding 4KB and cannot guarantee their flawless functionality.

1.3 Argument Pack

Taichi now introduces a powerful feature for developers - Argument Packs. This new functionality enables efficient caching of unchanged parameters between multiple kernel calls, which not only provides convenience when launching a kernel, but also boosts the performance.

Key Advantages

  • Argument Pack: User-defined data types that encapsulate multiple parameters into a single, manageable unit.
  • Buffering Capability: Store and reuse parameters that remain constant across kernel calls, reducing the overhead of repeated parameter passing.
  • Device-level Caching: Taichi optimizes performance by caching argpacks directly on the device.

Usage Example

import taichi as ti
ti.init()

# Defining a custom argument type using "ti.types.argpack"
view_params_tmpl = ti.types.argpack(view_mtx=ti.math.mat4, proj_mtx=ti.math.mat4, far=ti.f32)

# Declaration of a Taichi kernel leveraging Argument Packs
@ti.kernel
def p(view_params: view_params_tmpl) -> ti.f32:
    return view_params.far

# Instantiation of the argument pack
view_params = view_params_tmpl(
    view_mtx=ti.math.mat4(
        [[1, 0, 0, 0],
         [0, 1, 0, 0],
         [0, 0, 1, 0],
         [0, 0, 0, 1]]),
    proj_mtx=ti.math.mat4(
        [[1, 0, 0, 0],
         [0, 1, 0, 0],
         [0, 0, 1, 0],
         [0, 0, 0, 1]]),
    far=1)

# Executing the kernel with the Argument Pack
print(p(view_params))  # Outputs: 1.0

Supported Data Types

Argument Packs are currently compatible with a variety of data types, including scalar, matrix, vector, Ndarray, and Struct.

Limitations

Please note that Argument Packs currently do not support the following features and data types:

  • Ahead-of-Time (AOT) Compilation and Compute Graph
  • ti.template
  • ti.data_oriented

2. Improvements

2.1 CUDA Memory Allocation Improvements

Dynamic VRAM Allocation:

  • In our latest update, the CUDA backend has been optimized to dynamically allocate Video RAM (VRAM), significantly reducing the initial preallocation requirement. Now, less than 50MB is preallocated upon ti.init.

Changes in device_memory_GB and device_memory_fraction Usage:

  • These settings are now specifically tailored for preallocating memory for SPARSE data structures, such as ti.pointer. This preallocation occurs only once a Sparse data structure is detected in your code.

Impact on VRAM Consumption:

  • Users can expect a noticeable decrease in VRAM usage with these enhancements. For instance:
    diffmpm3d:  3866MB --> 3190 MB
    nerf_train_deploy: 5618MB --> 4664 MB

2.2 CUDA SIMT APIs

Added the following ti.simt.block APIs:

  • ti.simt.block.sync_any_nonzero
  • ti.simt.block.sync_all_nonzero
  • ti.simt.block.sync_count_nonzero

2.3 Sparse grid APIs

Added helper function to create a 2D/3D sparse grid, for example:

    # create a 2D sparse grid
    grid = ti.sparse.grid(
        {
            "pos": ti.math.vec2,
            "mass": ti.f32,
            "grid2particles": ti.types.vector(20, ti.i32),
        },
        shape=(10, 10),
    )

    # access
    grid[0, 0].pos = ti.math.vec2(1, 2)
    grid[0, 0].mass = 1.0
    grid[0, 0].grid2particles[2] = 123

2.4 GGUI

  • Added Metal backend support for GGUI

2.5 AOT

  • Added C-APIs of ti_import_cpu_memory() and ti_import_cuda_memory()
  • Added support for multiple AOT runtime devices
  • Added support for matrix/vector in compute graph in C-API
  • Added support for matrix/vector in compute graph in Python

2.6 Error reporting

  • Improved the quality and coverage of error messages

2.7 Autodiff

  • supports passing vector/matrix arguments in autodiff kernel
  • supports autodiff for torch Tensor and taichi ndarray on CPU and CUDA
  • supports passing grad tensor to primal kernel

3. Bug Fixes

3.1 Autodiff Bugfixes

  • Fixed a few bugs with use of ti.ad.Tape
  • Fixed a bug with random seed for loss

3.2 AOT Bugfixes

  • Fixed a few bugs with compute graph
  • Fixed a few bugs with C-API

3.3 API Bugfixes

  • Fixed a bunch of bugs related to Matrix/Vector
  • Fixed an error with Ndarray type check
  • Fixed a few error with taichi.math APIs
  • Fixed an error with SNode destruction
  • Fixed an error with dataclass support for struct with matrix
  • Fixed an error with ti.func
  • Fixed a few errors with ti.struct and struct field
  • Fixed a few errors with Sparse Matrix

3.4 Build & Environment Bugfixes

  • Fixed a few compilation issues on Windows platform
  • Fixed an issue with cusolver dependency

3.5 GGUI Bugfixes

  • Fix vec_to_euler that breaks GGUI cameras & handle camera logic better
  • Fix for ImGui widget size on HiDPI

4. Deprecation Notice

  • We have removed the CC backend because it is rarely used, and it lacks maintenance.
  • We are deprecating ti.experimental.real_func because it is no longer experimental. Please use ti.real_func instead.

5. Full changelog

Highlights:
   - **Bug fixes**
      - Fix macro error with ti_import_cpu_memory (#8401) (by **Zhanlue Yang**)
      - Fix argpack nesting issues (by **listerily**)
      - Convert matrices to structs in argpack type members, Fixing layout error (by **listerily**)
      - Fix error when returning a struct field member when the return … (#8271) (by **秋云未云**)
      - Fix Erroneous handling of ndarray in real function in CFG (#8245) (by **Lin Jiang**)
      - Fix issue with passing python-scope Matrix as ti.func argument (#8197) (by **Zhanlue Yang**)
      - Fix incorrect CFG Graph structure due to missing Block wiith OffloadedStmts on LLVM backend (#8113) (by **Zhanlue Yang**)
      - Fix type inference error with LowerMatrixPtr pass (#8105) (by **Zhanlue Yang**)
      - Set initial value for Cuda device allocation (#8063) (by **Zhanlue Yang**)
      - Fix the insertion position of the access chain (#7957) (by **Lin Jiang**)
      - Fix wrong datatype size when writing to ndarray from Python scope (by **Ailing Zhang**)
   - **CUDA backend**
      - Warn driver version if it doesn't support memory pool. (#7912) (by **Haidong Lan**)
   - **Documentation**
      - Fixing typo in impl.py on ti.grouped function documentation (#8407) (by **Quentin Warnant**)
      - Update doc about kernels and functions (#8400) (by **Lin Jiang**)
      - Update documentation (#8089) (by **Zhao Liang**)
      - Update docstring for inverse func (#8170) (by **Zhao Liang**)
      - Update type.md, add descriptions of the vector (#8048) (by **Chenzhan Shang**)
      - Fix a bug in faq.md (#7992) (by **Zhao Liang**)
      - Fix problems in type_system.md (#7949) (by **秋云未云**)
      - Add doc about struct arguments (#7959) (by **Lin Jiang**)
      - Fix docstring of mix function (#7922) (by **Zhao Liang**)
      - Update faq and ggui, and add them to CI (#7861) (by **Zhao Liang**)
      - Add kernel sync doc (#7831) (by **Zhao Liang**)
   - **Error messages**
      - Warn before calling the external function (#8177) (by **Lin Jiang**)
      - Add option to print full traceback in Python (#8160) (by **Lin Jiang**)
      - Let to_primitive_type throw an error if the type is a pointer (by **lin-hitonami**)
      - Update deprecation warning of the graph arguments (#7965) (by **Lin Jiang**)
   - **Language and syntax**
      - Add clz instruction (#8276) (by **Jett Chen**)
      - Move real function out of the experimental module (#8399) (by **Lin Jiang**)
      - Fix error with loop unique analysis for MatrixPtrStmt (#8307) (by **Zhanlue Yang**)
      - Pass DebugInfo from Python to C++ for ndarray and field (#8286) (by **魔法少女赵志辉**)
      - Support TensorType for SharedArray (#8258) (by **Zhanlue Yang**)
      - Use ErrorEmitter in type check passes (#8285) (by **魔法少女赵志辉**)
      - Implement struct DebugInfo and ErrorEmitter (#8284) (by **魔法少女赵志辉**)
      - Add TensorType support for Constant Folding (#8250) (by **Zhanlue Yang**)
      - Support TensorType for irpass::alg_simp() (#8225) (by **Zhanlue Yang**)
      - Support vector/matrix ndarray arguments in real function (by **Lin Jiang**)
      - Fix error on ndarray type check (by **Lin Jiang**)
      - Support real function in data-oriented classes (by **lin-hitonami**)
      - Let kernel support return type annotated with 'typing.Tuple' (by **lin-hitonami**)
      - Support tuple return value for kernel and real function (by **lin-hitonami**)
      - Let static assert be in static scope (#8217) (by **Lin Jiang**)
      - Avoid scalarization for AOS GlobalPtrStmt (#8187) (by **Zhanlue Yang**)
      - Support matrix return value for real function (by **lin-hitonami**)
      - Support ndarray argument for real function (by **lin-hitonami**)
      - Cast the scalar arguments and return values of ti.func if the type hints exist (#8193) (by **Lin Jiang**)
      - Handle MatrixPtrStmt for uniquely_accessed_pointers() (#8165) (by **Zhanlue Yang**)
      - Support struct arguments for real function (by **lin-hitonami**)
      - Merge irpass::half2_vectorize() with irpass::scalarize() (#8102) (by **Zhanlue Yang**)
      - Migrate irpass::scalarize() after optimize_bit_struct_stores & determine_ad_stack_size (#8097) (by **Zhanlue Yang**)
      - Migrate irpass::scalarize() after irpass::demote_operations() (#8096) (by **Zhanlue Yang**)
      - Migrate irpass::scalarize() after irpass::lower_access() (#8091) (by **Zhanlue Yang**)
      - Migrate irpass::scalarize() after irpass::make_block_local() (#8090) (by **Zhanlue Yang**)
      - Support TensorType for Dead-Store-Elimination (#8065) (by **Zhanlue Yang**)
      - Optimize alias checking conditions for store-to-load forwarding (#8079) (by **Zhanlue Yang**)
      - Support TensorType for Load-Store-Forwarding (#8058) (by **Zhanlue Yang**)
      - Fix TensorTyped error with irpass::make_thread_local() (#8051) (by **Zhanlue Yang**)
      - Fix numerical issue with auto_diff() (#8025) (by **Zhanlue Yang**)
      - Migrate irpass::scalarize() after irpass::make_mesh_block_local() (#8030) (by **Zhanlue Yang**)
      - Migrate irpass::scalarize() after irpass::make_thread_local() (#8028) (by **Zhanlue Yang**)
      - Support allocate with cuda memory pool and reduce preallocation size accordingly (#7929) (by **Zhanlue Yang**)
      - Migrate irpass::scalarize() after irpass::demote_no_access_mesh_fors() (#7956) (by **Zhanlue Yang**)
      - Fix error with irpass::check_out_of_bound() for TensorTyped ExternalPtrStmt (#7997) (by **Zhanlue Yang**)
      - Migrate irpass::scalarize() after irpass::demote_atomics() (#7943) (by **Zhanlue Yang**)
      - Separate out preallocation logics for runtime objects (#7938) (by **Zhanlue Yang**)
      - Remove deprecated funcs in __init__.py (#7941) (by **Lin Jiang**)
      - Remove deprecated sparse_matrix_builder function (#7942) (by **Lin Jiang**)
      - Remove deprecated compile option ndarray_use_cached_allocator (#7937) (by **Zhanlue Yang**)
      - Migrate irpass::scalarize() after irpass::detect_read_only() (#7939) (by **Zhanlue Yang**)
      - Remove deprecated funcs in ti.ui (#7940) (by **Lin Jiang**)
      - Remove the support for 'is' (#7930) (by **Lin Jiang**)
      - Migrate irpass::scalarize() after irpass::offload() (#7919) (by **Zhanlue Yang**)
      - Raise error when the dimension of the ndrange does not equal to the number of the loop variable (#7933) (by **Lin Jiang**)
      - Remove a.atomic(b) (#7925) (by **Lin Jiang**)
      - Cancel deprecating native min/max (#7928) (by **Lin Jiang**)
      - Fix the api doc search problem (#7918) (by **Zhao Liang**)
      - Move irpass::scalarize() after irpass::auto_diff() (#7902) (by **Zhanlue Yang**)
      - Fix Ndarray fill with Matrix/Vector typed values (#7901) (by **Zhanlue Yang**)
      - Add cast to field.fill() interface (#7899) (by **Zhanlue Yang**)
      - Let nested data classes have methods (#7909) (by **Lin Jiang**)
      - Let kernel argument support matrix nested in a struct (by **lin-hitonami**)
      - Support the functions of dataclass as kernel argument and return value (#7865) (by **Lin Jiang**)
      - Fix a bug on PosixPath (#7860) (by **Zhao Liang**)
      - Postpone MatrixType scalarization to irpass::differentiation_validation_check() (#7839) (by **Zhanlue Yang**)
      - Postpone MatrixType scalarization to irpass::gather_meshfor_relation_types() (#7838) (by **Zhanlue Yang**)
   - **Miscellaneous**
      - Make clang-tidy happy on 'explicit' (#7999) (by **秋云未云**)
   - **OpenGL backend**
      - Fix: runtime caught error cannot be displayed in opengl (#7998) (by **秋云未云**)
   - **IR optimization passes**
      - Make merging casts int(int(x)) less aggressive (#7944) (by **Ailing**)
      - Fix redundant clone of stmts across offloaded tasks (#7927) (by **Ailing**)
   - **Refactor**
      - Refactor the argument passing logic of rwtexture and remove extra_args (#7914) (by **Lin Jiang**)
taichi - v1.6.0

Published by github-actions[bot] over 1 year ago

Deprecation Notice

  • We removed some APIs that were deprecated a long time ago. See the table below:
Removed API Replace with
Using atomic operations like a.atomic_add(b) ti.atomic_add(a, b) or a += b
Using is and is not inside Taichi kernel and Taichi function Not supported
Ndrange for loop with the number of the loop variables not equal to the dimension of the ndrange Not supported
ti.ui.make_camera() ti.ui.Camera()
ti.ui.Window.write_image() ti.ui.Window.save_image()
ti.SOA ti.Layout.SOA
ti.AOS ti.Layout.AOS
ti.print_profile_info ti.profiler.print_scoped_profiler_info
ti.clear_profile_info ti.profiler.clear_scoped_profiler_info
ti.print_memory_profile_info ti.profiler.print_memory_profiler_info
ti.CuptiMetric ti.profiler.CuptiMetric
ti.get_predefined_cupti_metrics ti.profiler.get_predefined_cupti_metrics
ti.print_kernel_profile_info ti.profiler.print_kernel_profiler_info
ti.query_kernel_profile_info ti.profiler.query_kernel_profiler_info
ti.clear_kernel_profile_info ti.profiler.clear_kernel_profiler_info
ti.kernel_profiler_total_time ti.profiler.get_kernel_profiler_total_time
ti.set_kernel_profiler_toolkit ti.profiler.set_kernel_profiler_toolkit
ti.set_kernel_profile_metrics ti.profiler.set_kernel_profiler_metrics
ti.collect_kernel_profile_metrics ti.profiler.collect_kernel_profiler_metrics
ti.VideoManager ti.tools.VideoManager
ti.PLYWriter ti.tools.PLYWriter
ti.imread ti.tools.imread
ti.imresize ti.tools.imresize
ti.imshow ti.tools.imshow
ti.imwrite ti.tools.imwrite
ti.ext_arr ti.types.ndarray
ti.any_arr ti.types.ndarray
ti.Tape ti.ad.Tape
ti.clear_all_gradients ti.ad.clear_all_gradients
ti.linalg.sparse_matrix_builder ti.types.sparse_matrix_builder
  • We no longer deprecate the builtin min/max function in the Taichi kernel anymore.
  • We deprecate some arguments in the declaration of the arguments of the compute graph, and they will be removed in v1.7.0. Including:
    • element_shape argument for scalar and ndarray
    • shape, channel_format and num_channels arguments for texture
  • cc backend will be removed at next release (v1.7.0)

New features

Struct arguments

You can now use struct arguments in all backends. The structs can be nested, and it can contain matrices and vectors. Here's an example:

transform_type = ti.types.struct(R=ti.math.mat3, T=ti.math.vec3)
pos_type = ti.types.struct(x=ti.math.vec3, trans=transform_type)
@ti.kernel
def kernel_with_nested_struct_arg(p: pos_type) -> ti.math.vec3:
    return p.trans.R @ p.x + p.trans.T
trans = transform_type(ti.math.mat3(1), [1, 1, 1])
p = pos_type(x=[1, 1, 1], trans=trans)
print(kernel_with_nested_struct_arg(p))  # [4., 4., 4.]

Ndarray

  • Support 0 dim ndarray read & write in python scope
  • Fixed a bug when writing into ndarray from Python scope

Improvements

  • Support rsqrt operator in autodiff
  • Added assembly printer for CPU backend Zhanlue Yang
  • Supporting CUDA shared array allocation over 48KiB

Performance

  • Improved vectorization support on CPU backend, with significant performance gains for specific applications

New Examples

  • 2D euler fluid simulation example by Lee-abcde

Misc

  • Python 3.11 support
  • ti.frexp is supported on CUDA, Vulkan, Metal, OpenGL backends.
  • ti.math.popcnt intrinsic by Garry Ling
  • Fixed a memory leak issue during SNodeTree destruction Zhanlue Yang
  • Added validation and improved error report for ti.Field finalization Zhanlue Yang
  • Fixed a memory leak issue with Cuda backend in C-API Zhanlue Yang
  • Added support for formatted printing with str.format() and f-strings Tianyi Liu
  • Changed Python code formatter from yapf to black

Developer Experience

  • build.py script for preparing build & testing environment

Full changelog

Highlights:

  • Bug fixes
    • Fix wrong datatype size when writing to ndarray from Python scope (by Ailing Zhang)
  • CUDA backend
    • Warn driver version if it doesn't support memory pool. (#7912) (by Haidong Lan)
    • Better handling shared array shape check (#7818) (by Haidong Lan)
    • Support large shared memory for CUDA backend (#7452) (by Haidong Lan)
  • Documentation
    • Add doc about struct arguments (#7959) (by Lin Jiang)
    • Fix docstring of mix function (#7922) (by Zhao Liang)
    • Update faq and ggui, and add them to CI (#7861) (by Zhao Liang)
    • Update doc for dynamic snode (#7804) (by Zhao Liang)
    • Update field.md (#7819) (by zhoooou)
    • Update readme (#7808) (by yanqingzhang)
    • Update write_test.md (#7745) (by Qian Bao)
    • Update performance.md (#7720) (by Zhao Liang)
    • Update readme (#7673) (by Zhao Liang)
    • Update tutorial.md (#7512) (by Chenzhan Shang)
    • Update gui_system.md (#7628) (by Qian Bao)
    • Remove deprecated api docstrings (#7596) (by pengyu)
    • Fix the cexp docstring (#7588) (by Zhao Liang)
    • Add doc about returning struct (#7556) (by Lin Jiang)
  • Error messages
    • Update deprecation warning of the graph arguments (#7965) (by Lin Jiang)
  • Language and syntax
    • Remove deprecated funcs in init.py (#7941) (by Lin Jiang)
    • Remove deprecated sparse_matrix_builder function (#7942) (by Lin Jiang)
    • Remove deprecated funcs in ti.ui (#7940) (by Lin Jiang)
    • Remove the support for 'is' (#7930) (by Lin Jiang)
    • Raise error when the dimension of the ndrange does not equal to the number of the loop variable (#7933) (by Lin Jiang)
    • Remove a.atomic(b) (#7925) (by Lin Jiang)
    • Cancel deprecating native min/max (#7928) (by Lin Jiang)
    • Let nested data classes have methods (#7909) (by Lin Jiang)
    • Let kernel argument support matrix nested in a struct (by lin-hitonami)
    • Support the functions of dataclass as kernel argument and return value (#7865) (by Lin Jiang)
    • Fix a bug on PosixPath (#7860) (by Zhao Liang)
    • Seprate out the scalarization for MatrixOfMatrixPtrStmt and MatrixOfGlobalPtrStmt (#7803) (by Zhanlue Yang)
    • Fix pylance warning (#7805) (by Zhao Liang)
    • Support taking structs as kernel arguments (by lin-hitonami)
    • Fix math module circular import bugs (#7762) (by Zhao Liang)
    • Support formatted printing in str.format() and f-strings (#7686) (by 魔法少女赵志辉)
    • Replace internal representation of Python-scope ti.Matrix with numpy arrays (#7559) (by Yi Xu)
    • Stop letting ti.Struct inherit from TaichiOperations (#7474) (by Yi Xu)
    • Support writing sparse matrix as matrix market file (#7529) (by pengyu)
  • Vulkan backend
    • Fix repeated generation of array ranges in spirv codegen. (#7625) (by Haidong Lan)

Full changelog:

  • [CUDA] Warn driver version if it doesn't support memory pool. (#7912) (by Haidong Lan)
  • [Doc] Add doc about struct arguments (#7959) (by Lin Jiang)
  • [Error] Update deprecation warning of the graph arguments (#7965) (by Lin Jiang)
  • [windows] Workaround C++ mangling special chars (#7964) (by Ailing)
  • [Lang] Remove deprecated funcs in init.py (#7941) (by Lin Jiang)
  • [build] Remove redundant C-API shared object in wheel (#7950) (by Proton)
  • [test] Do not test cc backend (by Proton)
  • [Lang] Remove deprecated sparse_matrix_builder function (#7942) (by Lin Jiang)
  • [Lang] Remove deprecated funcs in ti.ui (#7940) (by Lin Jiang)
  • [Lang] Remove the support for 'is' (#7930) (by Lin Jiang)
  • [Lang] Raise error when the dimension of the ndrange does not equal to the number of the loop variable (#7933) (by Lin Jiang)
  • [Lang] Remove a.atomic(b) (#7925) (by Lin Jiang)
  • [Lang] Cancel deprecating native min/max (#7928) (by Lin Jiang)
  • [Doc] Fix docstring of mix function (#7922) (by Zhao Liang)
  • [example] Fix ti example bugs (#7903) (by Zhao Liang)
  • [ci] Build.py: Source generated env in new spawned shell (by Proton)
  • [misc] Fix changelog commit extract code (by Proton)
  • [ci] More robust build.py bootstrapping (#7920) (by Proton)
  • [Lang] [bug] Let nested data classes have methods (#7909) (by Lin Jiang)
  • [cuda] Only set CU_LIMIT_STACK_SIZE when necessary (#7906) (by Ailing)
  • [Lang] Let kernel argument support matrix nested in a struct (by lin-hitonami)
  • [Bug] Fix wrong datatype size when writing to ndarray from Python scope (by Ailing Zhang)
  • [lang] Support 0 dim ndarray read & write in python scope (by Ailing Zhang)
  • [Lang] Support the functions of dataclass as kernel argument and return value (#7865) (by Lin Jiang)
  • [spirv] Support struct as kernel argument (by Lin Jiang)
  • [spirv] Fix the ret type of frexp (by lin-hitonami)
  • [ci] Build.py: Do not try to bootstrap pip (too many issues) (#7897) (by Proton)
  • [ci] Build.py quirks fix (#7894) (by Proton)
  • [Doc] Update faq and ggui, and add them to CI (#7861) (by Zhao Liang)
  • [build] Remove unused apt pkg 'libmirclient-dev' to make 'build.py' run properly on ubuntu 22.04 (#7871) (by Yu Zhang)
  • [Lang] Fix a bug on PosixPath (#7860) (by Zhao Liang)
  • [ci] Polishing build.py, wave 4 (#7857) (by Proton)
  • [build] Use LLVM without zstd dependency on M1 Macs (#7856) (by Proton)
  • [doc] Update dev_install.md to reflect build.py usage (#7848) (by Proton)
  • [ci] Polishing build.py, wave 3 (#7845) (by Proton)
  • [lang] Add popcnt to llvm intrinsic support (#7772) (by Garry Ling)
  • [Doc] Update doc for dynamic snode (#7804) (by Zhao Liang)
  • [ci] Fix release build failure (#7834) (by Proton)
  • [ci] More robust build.py bootstrapping (#7833) (by Proton)
  • [Doc] Update field.md (#7819) (by zhoooou)
  • [autodiff] Remove redundant autodiff mode in kernel name (#7829) (by Ailing)
  • [lang] Migrate Caching Allocation logics from CudaDevice/AmdgpuDevice to DeviceMemoryPool (#7793) (by Zhanlue Yang)
  • [misc] Resolve code formatter frictions (#7828) (by Proton)
  • [Lang] Seprate out the scalarization for MatrixOfMatrixPtrStmt and MatrixOfGlobalPtrStmt (#7803) (by Zhanlue Yang)
  • [bug] Fix imgui_context in destroying multiple GGUI windows (#7812) (by Ailing)
  • [misc] Update git-blame-ignore-revs (#7825) (by Proton)
  • [ci] Complete doc test list, remove redundant default prelude (#7823) (by Proton)
  • [misc] Relax Black formatter line length limit to 120 (#7824) (by Proton)
  • [Doc] Update readme (#7808) (by yanqingzhang)
  • [misc] Switch code formatter from yapf to black (#7785) (by Proton)
  • [CUDA] Better handling shared array shape check (#7818) (by Haidong Lan)
  • [misc] Improve ::liong::json::deserialize() (by PGZXB)
  • [bug] Fix gen_offline_cache_key (#7810) (by PGZXB)
  • [ci] Fix build.py ensurepip (#7811) (by Proton)
  • [Lang] Fix pylance warning (#7805) (by Zhao Liang)
  • [lang] Support frexp on spirv-based backends (#7770) (by Ailing)
  • [lang] Split MemoryPool into DeviceMemoryPool and HostMemoryPool (#7786) (by Zhanlue Yang)
  • [misc] Optimize import overhead: pytorch and get_clangpp (#7797) (by Haidong Lan)
  • [ci] [doc] Tighten up document testing (#7801) (by Proton)
  • [ci] Polishing build.py, wave 2 (#7800) (by Proton)
  • [aot] Remove unused AotDataConverter (#7799) (by Lin Jiang)
  • [perf] Fix Taichi CPU backend compile parameter to pair performance with Numba. (#7731) (by zhengxianli)
  • [ci] Polishing build.py (#7794) (by Proton)
  • [bug] Returning nan for ti.sym_eig on identity matrix (#7443) (by Yimin Tang)
  • [Lang] Support taking structs as kernel arguments (by lin-hitonami)
  • [ir] Add 'create_load' to ArgLoadStmt (by lin-hitonami)
  • [ir] Let the src of GetElementStmt be a pointer (by lin-hitonami)
  • [lang] Clean up runtime allocation functions (#7773) (by Zhanlue Yang)
  • [lang] Migrate CUDA preallocation logic to CudaMemoryPool (#7746) (by Zhanlue Yang)
  • [gfx] Fix runtime buffer/image copy barrier semantics (#7781) (by Bob Cao)
  • [misc] Remove unnecessary TaskCodeGenLLVM::task_counter (#7777) (by PGZXB)
  • [ci] Temporarily force Windows release builds to run on sm70 nodes (#7767) (by Proton)
  • [refactor] Remove Kernel::lowered_ (#7765) (by PGZXB)
  • [gui] Fluid visualization utilities (#7682) (by Qian Bao)
  • [Lang] Fix math module circular import bugs (#7762) (by Zhao Liang)
  • [misc] Make pre-commit happy (#7768) (by Proton)
  • [ci] Build iOS AOT static library (by Proton)
  • [misc] Wrap path with std::filesystem::path (#7754) (by Bob Cao)
  • [lang] Support vector and matrix dtypes in ti.field (#7761) (by Ailing)
  • [ir] Remove unnecessary field_dims_ in ArgLoadStmt (#7755) (by Ailing)
  • [refactor] Remove Kernel::task_counter_ (#7751) (by PGZXB)
  • [ci] Build.py: Introduce TAICHI_CMAKE_ARGS manager for better log readability (by Proton)
  • [ci] Reorganize build.py code (by Proton)
  • [refactor] Let KernelCompilationManager manage kernel compilation in gfx::AotModuleBuilderImpl (#7715) (by PGZXB)
  • [misc] Remove unused FullSimplifyPass::Args::program (#7750) (by PGZXB)
  • [refactor] Re-impl LlvmAotModule using LLVM::KernelLauncher (#7744) (by PGZXB)
  • [lang] Implement experimental CG(Conjugate Gradient) solver in Taichi-lang (#7690) (by Qian Bao)
  • [lang] Transform bit_shr to bit_sar for uint (#7757) (by Ailing)
  • [ir] Postpone scalarize and lower_matrix_ptr to after bit loop vectorization (#7726) (by 魔法少女赵志辉)
  • [ci] Isolate post sm70 tests (#7740) (by Proton)
  • [cuda] Suppport using SparseMatrix on more CUDA versions (#7724) (by Yu Zhang)
  • [cuda] Update the data layout of CUDA (#7748) (by Lin Jiang)
  • [ci] Ignore dup benchmark data points (#7749) (by Proton)
  • [bug] Fix reduction of atomic max (#7747) (by Lin Jiang)
  • [Doc] Update write_test.md (#7745) (by Qian Bao)
  • [refactor] Remove 'args' from 'RuntimeContext' (by lin-hitonami)
  • [gfx] Let gfx backends use LaunchContextBuilder to build arguments in struct type (by lin-hitonami)
  • [gfx] [refactor] Convert f16 in LaunchContextBuilder (by lin-hitonami)
  • [gfx] Record the struct type of arguments and results in KernelContextAttributes (by lin-hitonami)
  • [gfx] Compile struct type of result and arguments in gfx backends (by lin-hitonami)
  • [refactor] Implement CompiledKernelData::check() (#7743) (by PGZXB)
  • [doc] [test] Update docs for printing with f-strings and formatted strings (#7733) (by 魔法少女赵志辉)
  • [lang] Improve error message for mismatched index for ndarrays in python scope (#7737) (by Ailing)
  • [bug] Avoid redundant cache loading (#7741) (by PGZXB)
  • [refactor] Let KernelCompilationManager manage kernel compilation in LlvmAotModuleBuilder (#7714) (by PGZXB)
  • [ci] Skip large shared memory test for Turing GPUs. (#7739) (by Haidong Lan)
  • [cuda] Remove deprecated cusparse functions (#7725) (by Yu Zhang)
  • [misc] Update pull_request_template.md (#7738) (by Ailing)
  • [misc] Remove TI_WARN for cuda in memory_pool.cpp (#7734) (by Ailing)
  • [CUDA] Support large shared memory for CUDA backend (#7452) (by Haidong Lan)
  • [vulkan] Update SPIR-V codegen to emit FP16 consts (#7676) (by Bob Cao)
  • [lang] Support frexp on cuda backend (#7721) (by Ailing)
  • [refactor] Unify implementation of ProgramImpl::compile() (by PGZXB)
  • [refactor] Introduce LLVM::KernelLauncher (by PGZXB)
  • [refactor] Introduce gfx::KernelLauncher (by PGZXB)
  • [test] Enable test offline cache on amdgpu and dx11 (#7703) (by PGZXB)
  • [lang] Refactor ownership and inheritance of allocators (#7685) (by Zhanlue Yang)
  • [ci] Fix git cache quirks (#7722) (by Proton)
  • [lang] Improve error msg in create ndarray (#7709) (by Garry Ling)
  • [Doc] Update performance.md (#7720) (by Zhao Liang)
  • [bug] Switch the gallery image used by README. (#7716) (by Chengchen(Rex) Wang)
  • [lang] Merge AMDGPUCachingAllocator to the generic CachingAllocator (#7717) (by Zhanlue Yang)
  • [bug] Invalid Field cache, RWAccessors cache, and Kernel cache upon SNodeTree destruction (#7704) (by Zhanlue Yang)
  • [ci] [test] Enable cc test on CI (by lin-hitonami)
  • [test] [cc] Skip tests that cc backend doesn't support (by lin-hitonami)
  • [test] Exclude the cc backend from tests that involve dynamic indexing (#7705) (by 魔法少女赵志辉)
  • [bug] Fix camera controls (#7681) (by liblaf)
  • [bug] [cc] Fix comparison op in cc backend (by Lin Jiang)
  • [bug] [cc] Set external ptr for cc backend (by lin-hitonami)
  • [lang] Merged VirtualMemoryAllocator into MemoryPool for LLVM-CPU backend (#7671) (by Zhanlue Yang)
  • [misc] Remove useless JITEvaluatorId (#7700) (by PGZXB)
  • [bug] Fixed building with clang on Windows failed (#7699) (by PGZXB)
  • [Lang] Support formatted printing in str.format() and f-strings (#7686) (by 魔法少女赵志辉)
  • [ci] Git caching proxy in CI (#7692) (by Proton)
  • [build] Let msvc generate pdb for cpp & c_api tests (by lin-hitonami)
  • [refactor] Stop storing pointers to array devallocs in kernel args (by lin-hitonami)
  • [aot] Implement bin2c in AOT cppgen (#7687) (by PENGUINLIONG)
  • [cpu] Remove atomics demotion for single-thread CPU targets. (#7631) (by Haidong Lan)
  • [aot] Export templated kernels (#7683) (by PENGUINLIONG)
  • [ci] Revive /benchmark (#7680) (by Proton)
  • [Doc] Update readme (#7673) (by Zhao Liang)
  • [misc] Device API public headers and CMake rework part 1 (#7624) (by Bob Cao)
  • [misc] Move optimize cpu module to KernelCodeGen (#7667) (by PGZXB)
  • [lang] [ir] Extract and save the format specifiers in str.format() (#7660) (by 魔法少女赵志辉)
  • [example] Add 2D euler fluid simulation example (#7568) (by Lee-abcde)
  • [wasm] Remove WASM backend (by lin-hitonami)
  • [build] Fix ssize_t type undefined errors when building with TI_WITH_LLVM=OFF on windows (#7665) (by Yu Zhang)
  • [misc] Remove unused Kernel::is_evaluator (#7669) (by PGZXB)
  • [misc] Remove unused Program::jit_evaluator_cache and Program::jit_evaluator_cache_mut (#7668) (by PGZXB)
  • [misc] Simplify test_offline_cache.py (#7663) (by PGZXB)
  • [lang] Improve error reporting for FieldsBuilder finalization (#7640) (by Zhanlue Yang)
  • [misc] Rename taichi::lang::llvm to taichi::lang::LLVM (#7659) (by PGZXB)
  • [refactor] Remove MemoryPool daemon in LLVM runtime (#7648) (by Zhanlue Yang)
  • [opt] Cleanup unncessary options in constant fold pass (#7661) (by Ailing)
  • [ci] Use build.py to prepare testing environment on Windows (#7658) (by Proton)
  • [opt] Move binary jit evaluator to host (by Ailing Zhang)
  • [test] Update C++ constant fold tests to test operator one by one (by Ailing Zhang)
  • [aot] Avoid shared library file being packaged into wheel data (#7652) (by Chenzhan Shang)
  • [ci] Fix scipy install (#7649) (by Proton)
  • [misc] Remove an unnecessary parameter of KernelCompilationManager::make_filename (by PGZXB)
  • [refactor] Remove some unnecessary functions of KernelCodeGen (by PGZXB)
  • [refactor] Re-impl JIT and Offline Cache on LLVM backends (by PGZXB)
  • [refactor] Implement llvm::KernelCompiler (by PGZXB)
  • [refactor] Gen code for KernelCodeGen::ir instead of KernelCodeGen::kernel->ir (by PGZXB)
  • [Doc] Update tutorial.md (#7512) (by Chenzhan Shang)
  • [ci] Test manylinux2014 build on PR (#7647) (by Proton)
  • [bug] Fix logical comparison returns -1 (#7641) (by Ailing)
  • [doc] Fix gui_system.md tests (#7646) (by Proton)
  • [Doc] Update gui_system.md (#7628) (by Qian Bao)
  • [aot] Hand-written CMake target script (#7644) (by PENGUINLIONG)
  • [ci] Do not use Android toolchain for perf testing (#7642) (by Proton)
  • [ci] Support Python 3.11 (#7627) (by Proton)
  • [build] Setup Android SDK environment for performance bot (#7635) (by Zhanlue Yang)
  • [ci] Update perf mon image (#7639) (by Proton)
  • [ci] Fix perf mon break (#7638) (by Proton)
  • [doc] Add documentation on using ghstack (#7632) (by Proton)
  • [build] Static linking libstdc++ on Linux (by Proton)
  • [ci] Rewrite Dockerfiles (by Proton)
  • [ci] Resolve "Needed single revision" workaround failure when the repo directory is empty (#7633) (by Proton)
  • [Vulkan] Fix repeated generation of array ranges in spirv codegen. (#7625) (by Haidong Lan)
  • [build] Switch to use docker with Android-SDK for performance bot (#7630) (by Zhanlue Yang)
  • [opengl] glfw finalize crash fix (by Proton)
  • [ci] build.py: Android support, entering shell, export env (by Proton)
  • [ci] Do not run tests with mixed backends (by Proton)
  • [refactor] Use f16 function from external lib (by lin-hitonami)
  • [refactor] Migrate members from RuntimeContext to LaunchContextBuilder (by lin-hitonami)
  • [bug] Fix setting arguments exceeding the max arg num (by lin-hitonami)
  • [cpu] Explicitly make cpu multithreading loop for range-fors. (#7593) (by Haidong Lan)
  • [aot] Fixed generator for compute graph (#7626) (by PENGUINLIONG)
  • [ir] Postpone scalarize and lower_matrix_ptr to after typecheck (#7589) (by 魔法少女赵志辉)
  • [aot] Header generator completed (#7609) (by PENGUINLIONG)
  • [amdgpu] Initialize AMDGPUContext with defaults (by Proton)
  • [build] Remove libSPIRV-Tools-shared.(so|dll) in wheel (by Proton)
  • [lang] Removed cpu_device(), cuda_device(), and amdgpu_device() from LlvmRuntimeExecutor (#7544) (by Zhanlue Yang)
  • [refactor] Remove the get/set functions in RuntimeContext (by lin-hitonami)
  • [aot] Pass LaunchContextBuilder to CompiledGraph::init_runtime_context (by lin-hitonami)
  • [gfx] Let GfxRuntime use LaunchContextBuilder (by lin-hitonami)
  • Let LaunchContextBuilder be the argument of the kernel launch function (by lin-hitonami)
  • [llvm] [refactor] Set the llvm runtime when executing (by lin-hitonami)
  • [refactor] Migrate {set, get}_{arg, ret} functions from RuntimeContext (by lin-hitonami)
  • [bug] Fix compilation error (#7606) (by PGZXB)
  • [aot] Hide map memory failure (#7604) (by PENGUINLIONG)
  • [refactor] Fix KernelCodeGen::kernel from Kernel * to const Kernel * (by PGZXB)
  • [refactor] Remove legacy implementation of llvm offline cache (by PGZXB)
  • [refactor] Impl llvm::CompiledKernelData (by PGZXB)
  • [bug] Type check for logical not op with real type inputs (#7600) (by Ailing)
  • [bug] Improve ndarray creation to fix segmentation fault (#7577) (by pengyu)
  • [lang] Add assembly printer for CPU backend (#7590) (by Zhanlue Yang)
  • [misc] Update docker filer (#7598) (by Zeyu Li)
  • [aot] Fix absolute path in generated TaichiTargets.cmake (#7597) (by Chenzhan Shang)
  • [Doc] Remove deprecated api docstrings (#7596) (by pengyu)
  • [llvm] Compile the kernel arguments to a StructType (by Lin Jiang)
  • [lang] Fix issue with llvm opaque pointer (#7557) (by Zhanlue Yang)
  • [opt] Constant folding for unary ops on host (#7573) (by Ailing)
  • [bug] Type check for bit_not op with real type inputs (#7592) (by Ailing)
  • [Doc] Fix the cexp docstring (#7588) (by Zhao Liang)
  • [Lang] Replace internal representation of Python-scope ti.Matrix with numpy arrays (#7559) (by Yi Xu)
  • [bug] Avoid cuda compilation via clang and ship pre-compiled .bc file instead (#7570) (by Zhanlue Yang)
  • [aot] Taichi kernel AOT command (#7565) (by PENGUINLIONG)
  • [bug] Fix struct members registered to StructField class (#7574) (by Ailing)
  • [aot] Mobile platform AOT build scripts (#7567) (by PENGUINLIONG)
  • [misc] Revert "Security upgrade ipython from 7.34.0 to 8.10.0 (#7341)" (#7571) (by Proton)
  • [test] Add cpp tests for constant folding pass (#7566) (by Ailing)
  • [misc] Security upgrade ipython from 7.34.0 to 8.10.0 (#7341) (by Chengchen(Rex) Wang)
  • [lang] Refactor CudaCachingAllocator into a more generic caching allocator (#7531) (by Zhanlue Yang)
  • [aot] Load GfxRuntime140 module from TCM (#7539) (by PENGUINLIONG)
  • [lang] Fixed useless serial shader to blit ExternalTensorShapeAlongAxisStmt on Metal (#7562) (by PENGUINLIONG)
  • [aot] Enable Vulkan 8bit storage (#7564) (by PENGUINLIONG)
  • [bug] Fix crashing on printing FrontendFuncCallStmt with no return value (by lin-hitonami)
  • [refactor] Remove LaunchContextBuilder::set_arg_raw (by lin-hitonami)
  • [llvm] Generalize TaskCodeGenLLVM::create_return to set_struct_to_buffer (by lin-hitonami)
  • [bug] Fix Cuda memory leak during TiRuntime destruction (#7345) (by Zhanlue Yang)
  • [ir] Let void struct type represent void type (by lin-hitonami)
  • [aot] Let C-API use LaunchContextBuilder to manage RuntimeContext (by lin-hitonami)
  • [ir] Let the reference type declare a pointer argument (by lin-hitonami)
  • [Doc] Add doc about returning struct (#7556) (by Lin Jiang)
  • [bug] Fix returning struct containing vec3 (#7552) (by Lin Jiang)
  • [lang] [ir] Extract and save the format specifiers in the f-string (#7514) (by 魔法少女赵志辉)
  • [Lang] Stop letting ti.Struct inherit from TaichiOperations (#7474) (by Yi Xu)
  • [aot] Recover AOT CI branch names (#7543) (by PENGUINLIONG)
  • [aot] Put TiRT in Python wheel and CMake script to find it in wheel (#7537) (by PENGUINLIONG)
  • [refactor] Remove the difficult-to-implement CompiledKernelData::size() (#7540) (by PGZXB)
  • [bug] Implement the missing clone function for FrontendFuncCallStmt (#7538) (by PGZXB)
  • [misc] Bump version to v1.6.0 (#7536) (by Haidong Lan)
  • [doc] Handle 2 digit minor versions correctly (#7535) (by Ritoban Roy-Chowdhury)
  • [aot] GfxRuntime140 convention docs (#7527) (by PENGUINLIONG)
  • [rhi] Refactor allocate_memory API to use RhiResult (#7463) (by Bob Cao)
  • [metal] Choose the proper msl version according to the device capability (#7506) (by Yu Zhang)
  • [Lang] Support writing sparse matrix as matrix market file (#7529) (by pengyu)
taichi - v1.5.0

Published by github-actions[bot] over 1 year ago

Deprecation Notice

New features

AOT

  • Taichi Runtime (TiRT) now supports Apple's Metal API and OpenGL ES for compatibility on old mobile platforms. Now Taichi programs can be deployed to any mainstream consumer devices.
    NOTE Taichi program deployment on mobile platforms is experimental. Please contact us at [email protected] for long-term services.
  • Taichi AOT now fully supports float16 dtype.

Ndarray

  • Out of bound check is now supported on ndarrays

Improvements

Python Frontend

We now support returning a struct on LLVM-based backends (CPU and CUDA backend). The struct can contain vectors and matrices, and it can also nest with other structs. Here's an example.

s0 = ti.types.struct(a=ti.math.vec3, b=ti.i16)
s1 = ti.types.struct(a=ti.f32, b=s0)

@ti.kernel
def foo() -> s1:
    return s1(a=1, b=s0(a=ti.math.vec3(100, 0.2, 3), b=1))

print(foo())  # {'a': 1.0, 'b': {'a': [100.0, 0.2, 3.0], 'b': 1}}

Performance

  • Support atomic operation on half2 for CUDA backend (with compute capability > 60). You can enable this with ti.init(half2_vectorization=True). This feature could effectively accelerate the Nerf training process, please refer to this repo for details.

GGUI

  • GGUI now has no computing backend restrictions! You can now use Metal, OpenGL, AMDGPU, or DirectX 11, in addition to CPU, CUDA, Vulklan that's previously suported by GGUI.
  • GGUI now has been validated on mesa's software rasterizer lavapipe, you can utilize this solution for headless server visualization, or on servers with no graphics capabilities (such as A100)
  • Add the fps_limit option which adjusts the maximal frame rate in GGUI.

Full changelog:

Highlights:
   - **AMDGPU backend**
      - Enable shared array on amdgpu backend (#7403) (by **Zeyu Li**)
      - Add print kernel amdgcn (#7357) (by **Zeyu Li**)
      - Add amdgpu backend profiler (#7330) (by **Zeyu Li**)
   - **Aot module**
      - Let AOT kernel inherit CallableBase and use LaunchContextBuilder (by **lin-hitonami**)
      - Deprecate element shape and field dim for AOT symbolic args (#7100) (by **Haidong Lan**)
   - **Bug fixes**
      - Fix copy_from() of StructField (#7294) (by **Yi Xu**)
      - Fix caching same loop invariant global vars inside nested fors (#7285) (by **Lin Jiang**)
      - Fix num_splits in parallel_struct_for (#7121) (by **Yi Xu**)
      - Fix ret_type and cast_type of UnaryOpStmt in Scalarize (#7082) (by **Yi Xu**)
   - **Documentation**
      - Update GGUI docs with correct API (#7525) (by **pengyu**)
      - Fix typos and improve example code in data_oriented_class.md (#7520) (by **pengyu**)
      - Update gui_system.md, remove unnecessary example (#7487) (by **NextoneX**)
      - Fix typo in API doc (#7511) (by **pengyu**)
      - Update math_module (#7405) (by **Zhao Liang**)
      - Update hello_world.md (#7400) (by **Zhao Liang**)
      - Update debugging.md (#7401) (by **Zhao Liang**)
      - Update hello_world.md (#7380) (by **Zhao Liang**)
      - Update type.md (#7376) (by **Zhao Liang**)
      - Update kernel_function.md (#7375) (by **Zhao Liang**)
      - Update hello_world.md (#7369) (by **Zhao Liang**)
      - Update hello_world.md (#7368) (by **Zhao Liang**)
      - Update data_oriented_class.md (#6790) (by **Zhao Liang**)
      - Update hello_world.md (#7367) (by **Zhao Liang**)
      - Update kernel_function.md (#7364) (by **Zhao Liang**)
      - Update hello_world.md (#7354) (by **Zhao Liang**)
      - Update llvm_sparse_runtime.md (#7323) (by **Gabriel Vainer**)
      - Update profiler.md (#7358) (by **Zhao Liang**)
      - Update kernel_function.md (#7356) (by **Zhao Liang**)
      - Update tut.md (#7352) (by **Gabriel Vainer**)
      - Update type.md (#7350) (by **Zhao Liang**)
      - Update hello_world.md (#7337) (by **Zhao Liang**)
      - Update append docstring (#7265) (by **Zhao Liang**)
      - Update ndarray.md (#7236) (by **Gabriel Vainer**)
      - Update llvm_sparse_runtime.md (#7215) (by **Zhao Liang**)
      - Remove doc tutorial (#7198) (by **Olinaaaloompa**)
      - Rename tutorial doc (#7186) (by **Zhao Liang**)
      - Update tutorial.md (#7176) (by **Zhao Liang**)
      - Update math_module.md (#7175) (by **Zhao Liang**)
      - Update debugging.md (#7173) (by **Zhao Liang**)
      - Fix C++ tutorial does not display on doc site (#7174) (by **Zhao Liang**)
      - Update doc regarding dynamic index (#7148) (by **Yi Xu**)
      - Move glossary to top level (#7118) (by **Zhao Liang**)
      - Update type.md (#7038) (by **Zhao Liang**)
      - Fix docstring (#7065) (by **Zhao Liang**)
   - **Error messages**
      - Allow IfExp on matrices when the condition is scalar (#7241) (by **Lin Jiang**)
      - Remove deprecations in ti.ui in 1.6.0 (#7229) (by **Lin Jiang**)
      - Remove deprecated ti.linalg.sparse_matrix_builder in 1.6.0 (#7228) (by **Lin Jiang**)
      - Remove deprecations in ASTTransformer in 1.6.0 (#7226) (by **Lin Jiang**)
      - Remove deprecated a.atomic_op(b) in Taichi v1.6.0 (#7225) (by **Lin Jiang**)
      - Remove deprecations in taichi/__init__.py in v1.6.0 (#7222) (by **Lin Jiang**)
      - Raise error when using deprecated ifexp on matrices (#7224) (by **Lin Jiang**)
      - Better error message when creating sparse snodes on backends that do not support sparse (#7191) (by **Lin Jiang**)
      - Raise errors when using metal sparse (#7113) (by **Lin Jiang**)
   - **GUI**
      - GGUI use shader "factory" (GGUI rework n/N) (#7271) (by **Bob Cao**)
   - **Intermediate representation**
      - Unified type system for internal operations (#6337) (by **daylily**)
   - **Language and syntax**
      - Keep ti.pyfunc (#7530) (by **Lin Jiang**)
      - Type check assignments between tensors (#7480) (by **Yi Xu**)
      - Fix pylance warnings raised by ti.static (#7437) (by **Zhao Liang**)
      - Deprecate arithmetic operations and fill() on ti.Struct (#7456) (by **Yi Xu**)
      - Fix pylance warnnings by ti.random (#7439) (by **Zhao Liang**)
      - Fix pylance types warning (#7417) (by **Zhao Liang**)
      - Add better error message for dynamic snode (#7238) (by **Zhao Liang**)
      - Simplify the swizzle generator (#7216) (by **Zhao Liang**)
      - Remove the deprecated dynamic_index switch (#7195) (by **Yi Xu**)
      - Remove deprecated packed switch (#7104) (by **Yi Xu**)
      - Raise errors when using the packed switch (#7125) (by **Yi Xu**)
      - Fix cannot use taichi in REPL (#7114) (by **Zhao Liang**)
      - Remove deprecated ti.Matrix.rotation2d() (#7098) (by **Yi Xu**)
      - Remove filename kwarg in aot Module save() (#7085) (by **Ailing**)
      - Remove sourceinspect deprecation warning message (#7081) (by **Zhao Liang**)
      - Make slicing a single row/column of a matrix return a vector (#7068) (by **Yi Xu**)
   - **Miscellaneous**
      - Strictly check ndim with external array (#7126) (by **Haidong Lan**)

Full changelog:
   - [cc] Add deprecation notice for cc backend (#7651) (by **Ailing**)
   - [misc] Cherry pick struct return related commits (#7575) (by **Haidong Lan**)
   - [Lang] Keep ti.pyfunc (#7530) (by **Lin Jiang**)
   - [bug] Fix symbol conflicts with taichi_cpp_tests (#7528) (by **Zhanlue Yang**)
   - [bug] Fix numerical issue with TensorType'd arithmetics (#7526) (by **Zhanlue Yang**)
   - [aot] Enable Metal AOT test (#7461) (by **PENGUINLIONG**)
   - [Doc] Update GGUI docs with correct API (#7525) (by **pengyu**)
   - [misc] Implement KernelCompialtionManager::clean_offline_cache (#7515) (by **PGZXB**)
   - [ir] Except shared array from demote atomics pass. (#7513) (by **Haidong Lan**)
   - [bug] Fix error with windows-clang compilation for cuda_runtime.cu (#7519) (by **Zhanlue Yang**)
   - [misc] Deprecate field dim and update deprecation warnings (#7491) (by **Haidong Lan**)
   - [build] Fix build failure without nvcc (#7521) (by **Ailing**)
   - [Doc] Fix typos and improve example code in data_oriented_class.md (#7520) (by **pengyu**)
   - [aot] Kernel argument count limit (#7518) (by **PENGUINLIONG**)
   - [Doc] Update gui_system.md, remove unnecessary example (#7487) (by **NextoneX**)
   - [AOT] [llvm] Let AOT kernel inherit CallableBase and use LaunchContextBuilder (by **lin-hitonami**)
   - [llvm] Let the offline cache record the type info of arguments and return values (by **lin-hitonami**)
   - [ir] Separate LaunchContextBuilder from Kernel (by **lin-hitonami**)
   - [Doc] Fix typo in API doc (#7511) (by **pengyu**)
   - [aot] Build Runtime C-API by default (#7508) (by **PENGUINLIONG**)
   - [bug] Fix run_tests.py --with-offline-cache (#7507) (by **PGZXB**)
   - [vulkan] Support printing constant strings containing % (#7499) (by **魔法少女赵志辉**)
   - [ci] Fix nightly version number, 2nd try (#7501) (by **Proton**)
   - [aot] Fixed memory leak in metal backend (#7500) (by **PENGUINLIONG**)
   - [ci] Fix nightly version number issue (#7498) (by **Proton**)
   - [example] Remove cv2, cairo dependency (#7496) (by **Zhao Liang**)
   - [type] Let Type * be serializable (by **lin-hitonami**)
   - [ci] Second attempt at permission check for ghstack landing (#7490) (by **Proton**)
   - [docs] Reword words of warning about building from source (#7488) (by **Anselm Schüler**)
   - [lang] Fixed double release of Metal command buffer (#7484) (by **PENGUINLIONG**)
   - [ci] Switch Android bots lock redis to bot-master (#7482) (by **Proton**)
   - [ci] Status check of ghstack CI bot (#7479) (by **Proton**)
   - [Lang] Type check assignments between tensors (#7480) (by **Yi Xu**)
   - [doc] Fix typo in ndarray.md (#7476) (by **Chenzhan Shang**)
   - [opt] Enable half2 optimization for atomic_add operations on CUDA backend (#7465) (by **Zhanlue Yang**)
   - [Lang] Fix pylance warnings raised by ti.static (#7437) (by **Zhao Liang**)
   - Let the LaunchContextBuilder manage the result buffer (by **lin-hitonami**)
   - [ci] Fix nightly build failure, and minor improvements (#7475) (by **Proton**)
   - [ci] Fix duplicated names in aot tests (#7471) (by **Ailing**)
   - [lang] Improve float16 support from Taichi type system (#7402) (by **Zhanlue Yang**)
   - [Lang] Deprecate arithmetic operations and fill() on ti.Struct (#7456) (by **Yi Xu**)
   - [misc] Add out of bound check for ndarray (#7458) (by **Ailing**)
   - [aot] Remove graph kernel interfaces (#7466) (by **PENGUINLIONG**)
   - [llvm] Let the RuntimeContext use the host result buffer (by **lin-hitonami**)
   - [gui] Fix 3d line drawing & add test (#7454) (by **Bob Cao**)
   - [lang] Fixed texture assertions (#7450) (by **PENGUINLIONG**)
   - [aot] Fixed header generator (#7455) (by **PENGUINLIONG**)
   - [aot] AOT module convention GfxRuntime140 (#7440) (by **PENGUINLIONG**)
   - [misc] Add an explicit error in cc backend codegen for dynamic indexing (#7449) (by **Ailing**)
   - [ci] Lower C++ tests concurrency (#7451) (by **Proton**)
   - [aot] Properly handle texture attributes (#7433) (by **PENGUINLIONG**)
   - [Lang] Fix pylance warnnings by ti.random (#7439) (by **Zhao Liang**)
   - [ir] Get the StructType of the kernel parameters (by **lin-hitonami**)
   - [ci] Report failure (not throwing exception) when C++ tests fail (#7435) (by **Proton**)
   - [llvm] Allocate the result buffer from preallocated memory (by **lin-hitonami**)
   - [vulkan] Fix GGUI and vulkan swapchain on AMD drivers (#7382) (by **Bob Cao**)
   - [autodiff] Handle return statement (#7389) (by **Mingrui Zhang**)
   - [misc] Remove unnecessary functions of gfx::AotModuleBuilderImpl (#7425) (by **PGZXB**)
   - [bug] Fix offline_cache::clean_offline_cache_files (ti cache clean) (#7426) (by **PGZXB**)
   - [test] Refactor C++ tests runner (#7421) (by **Proton**)
   - [ci] Adjust perfmon GPU freq (#7429) (by **Proton**)
   - [misc] Remove AotModuleParams::enable_lazy_loading (#7424) (by **PGZXB**)
   - [aot] Use graphs.json instead of TCB (#7392) (by **PENGUINLIONG**)
   - [refactor] Introduce KernelCompilationManager (#7409) (by **PGZXB**)
   - [IR] Unified type system for internal operations (#6337) (by **daylily**)
   - [lang] Add is_lvalue() to Expr to check writeback_binary operand (#7414) (by **魔法少女赵志辉**)
   - [bug] Fix get_error_string ret type typo (#7418) (by **Zeyu Li**)
   - [aot] Reorganize graph argument creation process (#7412) (by **PENGUINLIONG**)
   - [Amdgpu] Enable shared array on amdgpu backend (#7403) (by **Zeyu Li**)
   - [Lang] Fix pylance types warning (#7417) (by **Zhao Liang**)
   - [aot] Simplify device capability assignment (#7407) (by **PENGUINLIONG**)
   - [Doc] Update math_module (#7405) (by **Zhao Liang**)
   - [ci] Lock GPU frequency in perf benchmarking (#7413) (by **Proton**)
   - [ci] Add 'Needed single revision' workaround to all tasks (#7408) (by **Proton**)
   - [Doc] Update hello_world.md (#7400) (by **Zhao Liang**)
   - [refactor] Introduce KernelCompiler and implement spirv::KernelCompiler (#7371) (by **PGZXB**)
   - [Amdgpu] Add print kernel amdgcn (#7357) (by **Zeyu Li**)
   - [Doc] Update debugging.md (#7401) (by **Zhao Liang**)
   - [refactor] Disable ASTSerializer::allow_undefined_visitor (#7391) (by **PGZXB**)
   - [amdgpu] Enable llvm FpOpFusion option on AMDGPU backend (#7398) (by **Zeyu Li**)
   - [aot] Add test for shared array (#7387) (by **Ailing**)
   - [vulkan] Change command list submit error message & misc device API cleanups (#7395) (by **Bob Cao**)
   - [bug] Fix arch_uses_spirv (#7399) (by **PGZXB**)
   - [gui] Fix ggui & vulkan swapchain sizes on HiDPI displays (#7394) (by **Bob Cao**)
   - [Doc] Update hello_world.md (#7380) (by **Zhao Liang**)
   - [aot] Remove support for depth24stencil8 format on Metal (#7377) (by **PENGUINLIONG**)
   - [bug] Add DeviceCapabilityConfig to offline cache key (#7384) (by **PGZXB**)
   - [Doc] Update type.md (#7376) (by **Zhao Liang**)
   - [refactor] Remove dependencies on Callable::program in cpp tests (#7373) (by **PGZXB**)
   - [lang] Experimental support of conjugate gradient solver (#7035) (by **pengyu**)
   - [aot] Metal interop APIs (#7366) (by **PENGUINLIONG**)
   - [Doc] Update kernel_function.md (#7375) (by **Zhao Liang**)
   - [gui] Add `fps_limit` for GGUI (#7374) (by **Bob Cao**)
   - [Doc] Update hello_world.md (#7369) (by **Zhao Liang**)
   - [aot] Fix blockers in static library build with XCode (#7365) (by **PENGUINLIONG**)
   - [vulkan] Remove GLFW from Vulkan rhi dependency (#7351) (by **Bob Cao**)
   - [misc] Remove useless semicolon in llvm_program.h (#7372) (by **PGZXB**)
   - [Doc] Update hello_world.md (#7368) (by **Zhao Liang**)
   - [Amdgpu] Add amdgpu backend profiler (#7330) (by **Zeyu Li**)
   - [lang] Stop broadcasting scalar cond in select statements (#7344) (by **魔法少女赵志辉**)
   - [bug] Fix validation erros due to inactive VK_KHR_16bit_storage (#7360) (by **Zhanlue Yang**)
   - [aot] Support texture in Metal (#7363) (by **PENGUINLIONG**)
   - [Doc] Update data_oriented_class.md (#6790) (by **Zhao Liang**)
   - [Doc] Update hello_world.md (#7367) (by **Zhao Liang**)
   - [refactor] Introduce lang::CompiledKernelData (#7340) (by **PGZXB**)
   - [bug] Fix matrix initialization error with numpy.floating data (#7362) (by **Zhanlue Yang**)
   - [Doc] Update kernel_function.md (#7364) (by **Zhao Liang**)
   - [test] [amdgpu] Fix bug with allocs bb in function body (#7308) (by **Zeyu Li**)
   - [Doc] Update hello_world.md (#7354) (by **Zhao Liang**)
   - [aot] Fixed C-API docs (#7361) (by **PENGUINLIONG**)
   - [refactor] Remove dependencies on Callable::program in lang::CompiledGraph::run (#7288) (by **PGZXB**)
   - [DOC] Update llvm_sparse_runtime.md (#7323) (by **Gabriel Vainer**)
   - [Doc] Update profiler.md (#7358) (by **Zhao Liang**)
   - [Doc] Update kernel_function.md (#7356) (by **Zhao Liang**)
   - [aot] Improve Taichi C++ wrapper implementation (#7347) (by **PENGUINLIONG**)
   - [Doc] Update tut.md (#7352) (by **Gabriel Vainer**)
   - [ci] Add doc snippet CI requirements (#7355) (by **Proton**)
   - [amdgpu] Update device memory free (#7346) (by **Zeyu Li**)
   - [Doc] Update type.md (#7350) (by **Zhao Liang**)
   - [aot] Enable 16-bit dtype support for Taichi AOT (#7315) (by **Zhanlue Yang**)
   - [example] Re-implement the Cornell Box demo with shorter lines of code (#7252) (by **HK-SHAO**)
   - [aot] AOT CI refactorization (#7339) (by **PENGUINLIONG**)
   - [llvm] Let the kernel return struct (by **lin-hitonami**)
   - [Doc] Update hello_world.md (#7337) (by **Zhao Liang**)
   - [ci] Reduce doc test concurrency (#7336) (by **Proton**)
   - [ir] Refactor result fetching (by **lin-hitonami**)
   - [ir] Get the offsets of elements in StructType (by **lin-hitonami**)
   - [misc] Delete test.py (#7332) (by **Bob Cao**)
   - [vulkan] More subgroup operations (#7328) (by **Bob Cao**)
   - [vulkan] Add vulkan profiler (#7295) (by **Haidong Lan**)
   - [refactor] Move TaichiLLVMContext::runtime_jit_module and TaichiLLVMContext::create_jit_module() to LlvmRuntimeExecutor (#7320) (by **PGZXB**)
   - [refactor] Remove dependencies on LlvmProgramImpl::get_llvm_context() in TaskCodeGenLLVM (#7321) (by **PGZXB**)
   - [ci] Checkout with privileged token when landing ghstack PRs (#7331) (by **Proton**)
   - [ir] Add fields to StructType (by **lin-hitonami**)
   - [gui] Remove renderable reuse & make renderable immediate (#7327) (by **Bob Cao**)
   - [Gui] GGUI use shader "factory" (GGUI rework n/N) (#7271) (by **Bob Cao**)
   - [bug] Fix u64 field cannot be assigned value >= 2 ** 63 (#7319) (by **Lin Jiang**)
   - [type] Let the compute type of quant uint be unsigned int (by **lin-hitonami**)
   - [doc] Replace slack with discord (#7318) (by **yanqingzhang**)
   - [refactor] Change print statement to warnings.warn in taichi.lang.util.warning (#7301) (by **Jett Chen**)
   - [ci] ChatOps: ghstack land (#7314) (by **Proton**)
   - [refactor] Remove TaichiLLVMContext::lookup_function_pointer() (#7312) (by **PGZXB**)
   - [misc] Update MSVC flags (#7254) (by **Bob Cao**)
   - [doc] [ci] Cover code snippets in docs (#7309) (by **Proton**)
   - [refactor] Remove dependencies on LlvmProgramImpl::get_llvm_context() in KernelCodeGen (#7289) (by **PGZXB**)
   - [rhi] Device upload readback functions (#7278) (by **Bob Cao**)
   - [aot] Fixed external project inclusion (#7297) (by **PENGUINLIONG**)
   - [Doc] Update append docstring (#7265) (by **Zhao Liang**)
   - [refactor] Remove dependencies on Callable::program in lang::get_hashed_offline_cache_key (#7287) (by **PGZXB**)
   - [ci] [amdgpu] Enable amdgpu backend python unit tests (#7293) (by **Zeyu Li**)
   - [Bug] Fix copy_from() of StructField (#7294) (by **Yi Xu**)
   - [ci] Adapt new Android phone behavior (#7306) (by **Proton**)
   - [Bug] Fix caching same loop invariant global vars inside nested fors (#7285) (by **Lin Jiang**)
   - [amdgpu] Part5 enable the api of amdgpu (#7202) (by **Zeyu Li**)
   - [amdgpu] Enable struct for on amdgpu backend (#7247) (by **Zeyu Li**)
   - [misc] Update external/asset which was accidentally downgraded in #7248 (#7284) (by **Lin Jiang**)
   - [amdgpu] Update runtime module (#7248) (by **Zeyu Li**)
   - [llvm] Remove unused argument 'arch' in LlvmProgramImpl::get_llvm_context (#7282) (by **Lin Jiang**)
   - [misc] Remove deprecated kwarg in rw_texture type annotations (#7267) (by **Ailing**)
   - [ci] Tolerate duplicates when registering version (#7281) (by **Proton**)
   - [gui] Fix GGUI destruction order (#7279) (by **Bob Cao**)
   - [doc] Rename /doc/ndarray_android to /doc/tutorial (#7273) (by **Lin Jiang**)
   - [llvm] Unify the llvm context of host and device (#7249) (by **Lin Jiang**)
   - [misc] Fix manylinux2014 warning not printing (#7270) (by **Proton**)
   - [ci] Building: add complete PATH set for conda (#7268) (by **Proton**)
   - [autodiff] Support rsqrt operator (#7259) (by **Mingrui Zhang**)
   - [ci] Update pre-commit repos version (#7257) (by **Proton**)
   - [refactor] Fix "const CompileConfig *" to "const CompileConfig &" (Part2) (#7253) (by **PGZXB**)
   - [refactor] Fix "const CompileConfig *" to "const CompileConfig &" (#7243) (by **PGZXB**)
   - [aot] Added third-party render thread task injection for Unity (#7151) (by **PENGUINLIONG**)
   - [aot] Support statically linked C-API library on MacOS (#7207) (by **Zhanlue Yang**)
   - [gui] Force GGUI to go through host memory (nuking interops) (#7218) (by **Bob Cao**)
   - [Error] Allow IfExp on matrices when the condition is scalar (#7241) (by **Lin Jiang**)
   - [bug] Fix the parity of the RNG (#7239) (by **Lin Jiang**)
   - [Lang] Add better error message for dynamic snode (#7238) (by **Zhao Liang**)
   - [DOC] Update ndarray.md (#7236) (by **Gabriel Vainer**)
   - [Error] Remove deprecations in ti.ui in 1.6.0 (#7229) (by **Lin Jiang**)
   - [Doc] Update llvm_sparse_runtime.md (#7215) (by **Zhao Liang**)
   - [lang] Add validation checks for subscripts to reject negative indices (#7212) (by **Zhanlue Yang**)
   - [refactor] Remove legacy num_bits and acc_offsets from AxisExtractor (#7227) (by **Yi Xu**)
   - [Error] Remove deprecated ti.linalg.sparse_matrix_builder in 1.6.0 (#7228) (by **Lin Jiang**)
   - [Error] Remove deprecations in ASTTransformer in 1.6.0 (#7226) (by **Lin Jiang**)
   - [misc] Export DeviceAllocation into Python & support devalloc in field_info (#7233) (by **Bob Cao**)
   - [gui] Use templated bulk copy to simplify VBO preperation (#7234) (by **Bob Cao**)
   - [rhi] Add create_image_unique stub & misc RHI bug fixes (#7232) (by **Bob Cao**)
   - [opengl] Fix GLFW global context issue (#7230) (by **Bob Cao**)
   - [examples] Remove dependency on `ti.u8` compute type for ngp (#7220) (by **Bob Cao**)
   - [refactor] Remove Kernel::offload_to_executable (#7210) (by **PGZXB**)
   - [opengl] RW image binding & FP16 support (#7219) (by **Bob Cao**)
   - [Error] Remove deprecated a.atomic_op(b) in Taichi v1.6.0 (#7225) (by **Lin Jiang**)
   - [Error] Remove deprecations in taichi/__init__.py in v1.6.0 (#7222) (by **Lin Jiang**)
   - [Error] Raise error when using deprecated ifexp on matrices (#7224) (by **Lin Jiang**)
   - [refactor] Remove legacy BitExtractStmt (#7221) (by **Yi Xu**)
   - [amdgpu] Part4 link bitcode file (#7180) (by **Zeyu Li**)
   - [example] Reorganize example oit_renderer (#7208) (by **Lin Jiang**)
   - [aot] Fix ndarray aot with information from type hints (#7214) (by **Ailing**)
   - [gui] Fix wide line support on macOS (#7205) (by **Bob Cao**)
   - [Lang] Simplify the swizzle generator (#7216) (by **Zhao Liang**)
   - [refactor] Split constructing and compilation of lang::Function (#7209) (by **PGZXB**)
   - [doc] Fix netlify build command (#7217) (by **Ailing**)
   - [ci] M1 buildbot release tag (#7213) (by **Proton**)
   - [misc] Remove unused task_funcs (#7211) (by **PGZXB**)
   - [refactor] Program::this_thread_config() -> Program::compile_config() (#7199) (by **PGZXB**)
   - [doc] Fix format issues of windows debugging (#7197) (by **Olinaaaloompa**)
   - [aot] More OpenGL interop in C-API (#7204) (by **PENGUINLIONG**)
   - [metal] Disable a kernel test in offline cache to unblock CI (#7154) (by **Ailing**)
   - [ci] Switch Windows build script to build.py (#6993) (by **Proton**)
   - [misc] Update submodule taichi_assets (#7203) (by **Lin Jiang**)
   - [mac] Use ObjectLinkingLayer instead of RTDyldObjectLinkingLayer for aarch64 mac (#7201) (by **Ailing**)
   - [misc] Remove unused Program::jit_evaluator_id (#7200) (by **PGZXB**)
   - [misc] Remove legacy latex generation (#7196) (by **Yi Xu**)
   - [Lang] Remove the deprecated dynamic_index switch (#7195) (by **Yi Xu**)
   - [bug] Fix check_matched() failure with Ndarray holding TensorType'd element (#7178) (by **Zhanlue Yang**)
   - [Doc] Remove doc tutorial (#7198) (by **Olinaaaloompa**)
   - [bug] Fix example circle-packing (#7194) (by **Lin Jiang**)
   - [aot] C-API opengl runtime interop (#7120) (by **damnkk**)
   - [Error] Better error message when creating sparse snodes on backends that do not support sparse (#7191) (by **Lin Jiang**)
   - [example] Fix ti gallery close warning (#7187) (by **Zhao Liang**)
   - [lang] Interface refactors for MatrixType and VectorType (#7143) (by **Zhanlue Yang**)
   - [aot] Find Taichi in python wheel (#7181) (by **PENGUINLIONG**)
   - [gui] Update circles rendering to use quads (#7163) (by **Bob Cao**)
   - [Doc] Rename tutorial doc (#7186) (by **Zhao Liang**)
   - [ir] Fix gcc cannot compile inline template specialization (#7179) (by **Lin Jiang**)
   - [Doc] Update tutorial.md (#7176) (by **Zhao Liang**)
   - [aot] Replace std::exchange with local implementation for C++11 (#7170) (by **PENGUINLIONG**)
   - [ci] Fix near cache urls (missing comma) (#7158) (by **Proton**)
   - [docs] Create windows_debug.md (#7164) (by **Bob Cao**)
   - [Doc] Update math_module.md (#7175) (by **Zhao Liang**)
   - [aot] FindTaichi CMake module to help outside project integration (#7168) (by **PENGUINLIONG**)
   - [aot] Removed unused archs in C-API (#7167) (by **PENGUINLIONG**)
   - [Doc] Update debugging.md (#7173) (by **Zhao Liang**)
   - [refactor] Remove dependencies on Program::this_thread_config() in irpass::constant_fold (#7159) (by **PGZXB**)
   - [Doc] Fix C++ tutorial does not display on doc site (#7174) (by **Zhao Liang**)
   - [aot] C++ wrapper for memory slice and memory allocation with host access (#7171) (by **PENGUINLIONG**)
   - [aot] Fixed ti_get_last_error signature (#7165) (by **PENGUINLIONG**)
   - [misc] Log to stderr instead of stdout (#7166) (by **PENGUINLIONG**)
   - [aot] C-API get version wrapper (#7169) (by **PENGUINLIONG**)
   - [doc] Fix spelling of "paticle_field" (#7024) (by **Xiang (Kevin) Li**)
   - [misc] Remove useless Program::sync (#7160) (by **PGZXB**)
   - [doc] Update accelerate_python.md to use ti.max (#7161) (by **Tao Jin**)
   - [doc] Add doc ndarray (#7157) (by **Olinaaaloompa**)
   - [mac] Add .dylib and .cmake to built wheel (#7156) (by **Ailing**)
   - [refactor] Remove dependencies on Program::this_thread_config() in some tests (#7155) (by **PGZXB**)
   - [refactor] Remove dependencies on Program::this_thread_config() in llvm backends codegen (#7153) (by **PGZXB**)
   - [Lang] Remove deprecated packed switch (#7104) (by **Yi Xu**)
   - [example] Update quaternion arithmetics in fractal_3d_ggui (#7139) (by **Zhao Liang**)
   - [doc] Update field.md (Fields advanced) (#6867) (by **Gabriel Vainer**)
   - [ci] Use make_changelog.py to generate the full changelog (#7152) (by **Lin Jiang**)
   - [refactor] Rename Callable::*arg* to Callable::*param* (#7133) (by **PGZXB**)
   - [aot] Introduce new AOT deployment tutorial (#7144) (by **PENGUINLIONG**)
   - [bug] Unify error message matching with/without validation layers for CapiTest.FailMapDeviceOnlyMemory (#7110) (by **Zhanlue Yang**)
   - [lang] Remove redundant TensorType expansion for function returns (#7124) (by **Zhanlue Yang**)
   - [lang] Sign python library for Apple M1 (#7138) (by **PENGUINLIONG**)
   - [gui] Fix particle size limits (#7149) (by **Bob Cao**)
   - [lang] Migrate TensorType expansion in MatrixType/VectorType from Python code to Frontend IR (#7127) (by **Zhanlue Yang**)
   - [aot] Support texture arguments for AOT kernels (#7142) (by **Zhanlue Yang**)
   - [metal] Retain Metal commandBuffers & build command buffers directly (#7137) (by **Bob Cao**)
   - [rhi] Update `create_pipeline` API and add support of VkPipelineCache (#7091) (by **Bob Cao**)
   - [autodiff] Support grad in ndarray (#6906) (by **PhrygianGates**)
   - [Doc] Update doc regarding dynamic index (#7148) (by **Yi Xu**)
   - [refactor] Remove dependencies on Program::this_thread_config() in spirv::lower (#7134) (by **PGZXB**)
   - [Misc] Strictly check ndim with external array (#7126) (by **Haidong Lan**)
   - [ci] Run test when pushing to rc branches (#7146) (by **Lin Jiang**)
   - [refactor] Remove dependencies on Program::this_thread_config() in KernelCodeGen (#7086) (by **PGZXB**)
   - [ci] Disable backward_cpp on macOS (#7145) (by **Proton**)
   - [gui] Fix scene line renderable (#7131) (by **Bob Cao**)
   - [refactor] Remove useless Kernel::from_cache_ (#7132) (by **PGZXB**)
   - [cpu] Reuse VirtualMemoryAllocator for CPU ndarray memory allocation (#7128) (by **Ailing**)
   - [Lang] Raise errors when using the packed switch (#7125) (by **Yi Xu**)
   - [ci] Temporarily disable ad_external_array on Metal (#7136) (by **Bob Cao**)
   - [Error] Raise errors when using metal sparse (#7113) (by **Lin Jiang**)
   - [aot] AOT compat test in workflow (#7033) (by **damnkk**)
   - [Lang] Fix cannot use taichi in REPL (#7114) (by **Zhao Liang**)
   - [lang] Free ndarray memory when it's GC-ed in Python (#7072) (by **Ailing**)
   - [lang] Migrate TensorType expansion for FuncCallExpression from Python code to Frontend IR (#6980) (by **Zhanlue Yang**)
   - [amdgpu] Part2 add runtime (#6482) (by **Zeyu Li**)
   - [refactor] Remove dependencies on Program::this_thread_config() in codegen_cc.cpp (#7088) (by **PGZXB**)
   - [refactor] Remove dependencies on Program::this_thread_config() in gfx::run_codegen (#7089) (by **PGZXB**)
   - [Bug] Fix num_splits in parallel_struct_for (#7121) (by **Yi Xu**)
   - [Doc] Move glossary to top level (#7118) (by **Zhao Liang**)
   - [metal] Update Metal RHI impl & add support for shared arrays (#7107) (by **Bob Cao**)
   - [ci] Update amdgpu ci (#7117) (by **Zeyu Li**)
   - [refactor] Move Kernel::lower() outside the taichi::lang::Kernel (#7048) (by **PGZXB**)
   - [amdgpu] Part1 add codegen (#6469) (by **Zeyu Li**)
   - [Aot] Deprecate element shape and field dim for AOT symbolic args (#7100) (by **Haidong Lan**)
   - [refactor] Remove Program::current_ast_builder() (#7075) (by **PGZXB**)
   - [aot] Switch Metal to SPIR-V codegen (#7093) (by **PENGUINLIONG**)
   - [Lang] Remove deprecated ti.Matrix.rotation2d() (#7098) (by **Yi Xu**)
   - [doc] Modified some errors in the function examples (#7094) (by **welann**)
   - [ci] More Windows git hacks (#7102) (by **Proton**)
   - [Lang] Remove filename kwarg in aot Module save() (#7085) (by **Ailing**)
   - [aot] Rename device capability atomic_i64 to atomic_int64 for consistency (#7095) (by **PENGUINLIONG**)
   - [Lang] Remove sourceinspect deprecation warning message (#7081) (by **Zhao Liang**)
   - [example] Remove gui warning message (#7090) (by **Zhao Liang**)
   - [refactor] Remove unnecessary Kernel::arch (#7074) (by **PGZXB**)
   - [refactor] Remove unnecessary parameter of irpass::scalarize (#7087) (by **PGZXB**)
   - [Bug] Fix ret_type and cast_type of UnaryOpStmt in Scalarize (#7082) (by **Yi Xu**)
   - [lang] Migrate TensorType expansion for TextureOpExpression from Python code to Frontend IR (#6968) (by **Zhanlue Yang**)
   - [lang] Migrate TensorType expansion for ReturnStmt from Python code to Frontend IR (#6946) (by **Zhanlue Yang**)
   - [doc] Update ndarray deprecation warning to 1.5.0 (#7083) (by **Haidong Lan**)
   - [amdgpu] Update amdgpu module call (#7022) (by **Zeyu Li**)
   - [amdgpu] Add convert addressspace pass related unit test (#7023) (by **Zeyu Li**)
   - [ir] Let real function return nested StructType (by **lin-hitonami**)
   - [ir] Replace FuncCallExpression with FrontendFuncCallStmt (by **lin-hitonami**)
   - [example] Update gallery images (#7053) (by **Zhao Liang**)
   - [Doc] Update type.md (#7038) (by **Zhao Liang**)
   - [misc] Bump version to v1.5.0 (#7077) (by **Lin Jiang**)
   - [rhi] Update Stream `new_command_list` API (#7073) (by **Bob Cao**)
   - [Doc] Fix docstring (#7065) (by **Zhao Liang**)
   - [ci] Workaround windows checkout 'Needed a single revision' issue (#7078) (by **Proton**)
   - [Lang] Make slicing a single row/column of a matrix return a vector (#7068) (by **Yi Xu**)
taichi - v1.4.1

Published by github-actions[bot] over 1 year ago

Highlights:

Full changelog:

  • [ci] Tolerate duplicates when registering version (#7281) (by Proton)
  • [misc] Fix manylinux2014 warning not printing (#7270) (by Proton)
  • [misc] Bump version to 1.4.1 (by Lin Jiang)
  • [misc] Update submodule taichi_assets (#7203) (by Lin Jiang)
  • [bug] Fix example circle-packing (#7194) (by Lin Jiang)
taichi - v1.4.0

Published by github-actions[bot] almost 2 years ago

Deprecation Notice

  • Support for sparse SNodes on the Metal backend has been removed.
  • ti.Matrix.rotation2d() has been removed.
  • The packed switch in ti.init() has been removed.
  • The dynamic_index switch in ti.init() is now deprecated and will be removed in v1.5.0. See the feature introduction below for details.
  • Slicing from a single row/column of a matrix (e.g.a[x, a:b]) now returns a vector instead of a matrix.

New features

AOT

Taichi AOT is officially available in Taichi v1.4.0, along with a native Taichi Runtime (TiRT) library taichi_c_api. Native applications can now load compiled AOT modules and launch Taichi kernels without a Python interpreter.

In this release, TiRT has stabilized the Vulkan backend on desktop platforms and Android. You can find prebuilt TiRT binaries on the release page. You can refer to a comprehensive tutorial on the doc site; the detailed TiRT C-API documentation is available at https://docs.taichi-lang.org/docs/taichi_core.

Ndarray

Taichi ndarray is now formally released in v1.4.0. The ndarray is an array object that holds contiguous multi-dimensional data to allow easy exchange with external libraries. See documentation for more details.

Dynamic index

Before v1.4.0, when you wanted to access a vector/matrix with a runtime variable instead of a compile-time constant, you had to set ti.init(dynamic_index=True). However, that option only works for LLVM-based backends (CPU & CUDA) and may slow down runtime performance because all matrices are affected. Starting from v1.4.0, that option is no longer needed. You can use variable indices whenever necessary on all backends without affecting the performance of those matrices with only constant indices.

Improvements

Performance

  • The compilation speed has been optimized by ~2x.

Example list & ti gallery

Since v1.0.0, we have been enriching our taichi example collection, bringing the number of demos in the gallery window from eight to twelve. Run ti gallery to check out some new demos!
image

Bug fixes

  • Incorrect behavior of struct fors on sparse SNodes in certain cases has been fixed. (#7121)
  • CUDA will no longer allocate extra device memory when performing to_numpy() and from_numpy(). (#7008)
  • StructType is now allowed as a type hint to ti.func. (#6964)
  • Incorrect recompilation caused by filling in a matrix field with the same matrix has been fixed. (#6951)
  • Matrix type inference has been fixed. (#6928)
  • Getting 64-bit data from ndarrays in the Python scope is now handled correctly. (#6836)
  • Name collision problem in ti.dataclass has been fixed. (#6737)

Highlights:

  • Aot module
    • Deprecate element shape and field dim for AOT symbolic args (#7100) (by Haidong Lan)
  • Bug fixes
    • Fix num_splits in parallel_struct_for (#7121) (by Yi Xu)
    • Fix ret_type and cast_type of UnaryOpStmt in Scalarize (#7082) (by Yi Xu)
    • Fix getting 64-bit data from ndarray in Python scope (#6836) (by Yi Xu)
    • Avoid overwriting global tmp with dynamic_index=True (#6820) (by Yi Xu)
  • Build system
    • Deprecate export_core (#7028) (by Zhanlue Yang)
  • Command line interface
    • Add "ti cache clean" command to clean the offline cache files manually (#6937) (by PGZXB)
  • Documentation
    • Update tutorial.md (#7176) (by Zhao Liang)
    • Update math_module.md (#7175) (by Zhao Liang)
    • Update debugging.md (#7173) (by Zhao Liang)
    • Fix C++ tutorial does not display on doc site (#7174) (by Zhao Liang)
    • Update doc regarding dynamic index (#7148) (by Yi Xu)
    • Move glossary to top level (#7118) (by Zhao Liang)
    • Update type.md (#7038) (by Zhao Liang)
    • Fix docstring (#7065) (by Zhao Liang)
    • Remove packed mode in doc (#7030) (by Zhao Liang)
    • Minor doc update (#6952) (by Zhao Liang)
    • Glossary (#6101) (by Olinaaaloompa)
    • Update dac (#6875) (by Gabriel Vainer)
    • Update faq.md (#6921) (by Zhao Liang)
    • Update dataclass.md (#6876) (by Gabriel Vainer)
    • Update the documentation about Dynamic SNode (#6752) (by Lin Jiang)
    • Stop mentioning packed mode (#6755) (by Yi Xu)
  • Error messages
    • Raise errors when using metal sparse (#7113) (by Lin Jiang)
    • Do not show warning when the offline cache path does not exist (#7005) (by PGZXB)
  • GUI
    • Support colored texts (#7036) (by Dunfan Lu)
  • Intermediate representation
    • Allow a maximum of 12 SNode indices (#6901) (by Dunfan Lu)
  • Language and syntax
    • Raise errors when using the packed switch (#7125) (by Yi Xu)
    • Fix cannot use taichi in REPL (#7114) (by Zhao Liang)
    • Remove deprecated ti.Matrix.rotation2d() (#7098) (by Yi Xu)
    • Remove filename kwarg in aot Module save() (#7085) (by Ailing)
    • Remove sourceinspect deprecation warning message (#7081) (by Zhao Liang)
    • Make slicing a single row/column of a matrix return a vector (#7068) (by Yi Xu)
    • Deprecate the dynamic_index switch (#7071) (by Yi Xu)
    • Add irpass::eliminate_immutable_local_vars() test cases for TensorType (#7043) (by Zhanlue Yang)
    • Fix gui docstring (#7003) (by Zhao Liang)
    • Support dynamic indexing in spirv (#6990) (by Yi Xu)
    • Support dynamic indexing in metal (#6985) (by Yi Xu)
    • Support LU sparse solver on CUDA backend (#6967) (by pengyu)
    • Fix struct type problem (#6949) (by Zhao Liang)
    • Add warning message when converting dynamic snode to numpy (#6853) (by Zhao Liang)
    • Deprecate sourceinspect dependency (#6894) (by Zhao Liang)
    • Warn users if ndarray size is out of int32 boundary (#6846) (by Yi Xu)
    • Remove the real_matrix switch (#6885) (by Yi Xu)
    • Enable real_matrix and real_matrix_scalarize by default (#6801) (by Zhanlue Yang)
    • Raise an error for the semantic change of transpose() (#6813) (by Yi Xu)
    • Add bool type in python as an alias to i32 (#6742) (by daylily)
    • Add deprecation warning for the removal of the packed switch (#6753) (by Yi Xu)
  • Metal backend
    • Raise deprecate warning and error when using sparse snodes on metal (#6739) (by Lin Jiang)
  • Miscellaneous
    • Strictly check ndim with external array (#7126) (by Haidong Lan)
    • Refactored flattend_values() to avoid potential conflicts in flattened statements (#6749) (by Zhanlue Yang)

Full changelog:

  • [Doc] Update tutorial.md (#7176) (by Zhao Liang)
  • [aot] (cherry-pick) Removed unused archs in C-API (#7167), FindTaichi CMake module to help outside project integration (#7168) (#7177) (by PENGUINLIONG)
  • [docs] Create windows_debug.md (#7164) (by Bob Cao)
  • [Doc] Update math_module.md (#7175) (by Zhao Liang)
  • [Doc] Update debugging.md (#7173) (by Zhao Liang)
  • [Doc] Fix C++ tutorial does not display on doc site (#7174) (by Zhao Liang)
  • [doc] Fix spelling of "paticle_field" (#7024) (by Xiang (Kevin) Li)
  • [doc] Update accelerate_python.md to use ti.max (#7161) (by Tao Jin)
  • [aot] Fixed ti_get_last_error signature (#7165) (by PENGUINLIONG)
  • [example] Update quaternion arithmetics in fractal_3d_ggui (#7139) (by Zhao Liang)
  • [doc] Add doc ndarray (#7157) (by Olinaaaloompa)
  • [doc] Update field.md (Fields advanced) (#6867) (by Gabriel Vainer)
  • [ci] Use make_changelog.py to generate the full changelog (#7152) (by Lin Jiang)
  • [aot] Introduce new AOT deployment tutorial (#7144) (by PENGUINLIONG)
  • [Doc] Update doc regarding dynamic index (#7148) (by Yi Xu)
  • [Misc] Strictly check ndim with external array (#7126) (by Haidong Lan)
  • [ci] Run test when pushing to rc branches (#7146) (by Lin Jiang)
  • [ci] Disable backward_cpp on macOS (#7145) (by Proton)
  • [gui] Fix scene line renderable (#7131) (by Bob Cao)
  • [Lang] Raise errors when using the packed switch (#7125) (by Yi Xu)
  • [cpu] Reuse VirtualMemoryAllocator for CPU ndarray memory allocation (#7128) (by Ailing)
  • [ci] Temporarily disable ad_external_array on Metal (#7136) (by Bob Cao)
  • [Error] Raise errors when using metal sparse (#7113) (by Lin Jiang)
  • [misc] Cherry-pick #7072 into rc-v1.4.0 (#7135) (by Ailing)
  • [aot] Rename device capability atomic_i64 to atomic_int64 for consistency (#7095) (by PENGUINLIONG)
  • [Lang] Fix cannot use taichi in REPL (#7114) (by Zhao Liang)
  • [Bug] Fix num_splits in parallel_struct_for (#7121) (by Yi Xu)
  • [Doc] Move glossary to top level (#7118) (by Zhao Liang)
  • [Aot] Deprecate element shape and field dim for AOT symbolic args (#7100) (by Haidong Lan)
  • [Lang] Remove deprecated ti.Matrix.rotation2d() (#7098) (by Yi Xu)
  • [doc] Modified some errors in the function examples (#7094) (by welann)
  • [ci] More Windows git hacks (#7102) (by Proton)
  • [Lang] Remove filename kwarg in aot Module save() (#7085) (by Ailing)
  • [Lang] Remove sourceinspect deprecation warning message (#7081) (by Zhao Liang)
  • [example] Remove gui warning message (#7090) (by Zhao Liang)
  • [Bug] Fix ret_type and cast_type of UnaryOpStmt in Scalarize (#7082) (by Yi Xu)
  • [doc] Update ndarray deprecation warning to 1.5.0 (#7083) (by Haidong Lan)
  • [example] Update gallery images (#7053) (by Zhao Liang)
  • [Doc] Update type.md (#7038) (by Zhao Liang)
  • [Doc] Fix docstring (#7065) (by Zhao Liang)
  • [Lang] Make slicing a single row/column of a matrix return a vector (#7068) (by Yi Xu)
  • [ci] Workaround windows checkout 'Needed a single revision' issue (#7078) (by Proton)
  • [lang] Make sure ndarrays created in python frontend are initialized as zero (#7060) (by Ailing)
  • [Lang] Deprecate the dynamic_index switch (#7071) (by Yi Xu)
  • [misc] Update python package metadata (#7063) (by Proton)
  • [bug] Fixed compilation error caused by #7047 (#7069) (by PGZXB)
  • [opt] Automatically identify allocas to scalarize (#7055) (by Yi Xu)
  • [refactor] Remove ir parameter of KernelCodeGen::KernelCodeGen(Kernel *kernel, IRNode *ir) (#7046) (by PGZXB)
  • [refactor] Remove unnecessary IRNode::kernel (#7047) (by PGZXB)
  • [refactor] Remove dependencies on Program::current_ast_builder() in C++ side (#7044) (by PGZXB)
  • [ci] Version sanity check before publishing (#7062) (by Proton)
  • [ci] Make changelog generation working again (#7058) (by Proton)
  • [rhi] Update CommandList dispatch API (#7052) (by Bob Cao)
  • [aot] C-API versioning (#7050) (by PENGUINLIONG)
  • [refactor] Remove offloaded parameter of Program::compile() (#7045) (by PGZXB)
  • [lang] Migrate TensorType expansion for subscription indices from Python to Frontend IR (#6942) (by Zhanlue Yang)
  • [opt] Add ExtractPointers pass for dynamic index (#7051) (by Yi Xu)
  • [Lang] Add irpass::eliminate_immutable_local_vars() test cases for TensorType (#7043) (by Zhanlue Yang)
  • [Lang] Fix gui docstring (#7003) (by Zhao Liang)
  • [rhi] Update compute CommandList APIs (except dispatch) (#7037) (by Bob Cao)
  • [ir] Let GetElementExpression&Statement support index list (#7049) (by Lin Jiang)
  • [aot] C-API opengl runtime interop (#7042) (by PENGUINLIONG)
  • [ci] Pin pre-commit python version to 3.10 (#7041) (by Proton)
  • [opengl] Enable more gles tests in CI (#7031) (by Ailing)
  • [ci] Tuning headless demo VRAM usage (#7039) (by Proton)
  • [Build] Deprecate export_core (#7028) (by Zhanlue Yang)
  • [GUI] Support colored texts (#7036) (by Dunfan Lu)
  • [aot] Revert "C-API opengl runtime interop (#7014)" (#7032) (by Proton)
  • [ci] Update pre-commit app versions (#7025) (by Proton)
  • [Doc] Remove packed mode in doc (#7030) (by Zhao Liang)
  • Revert "[opengl] Enable more gles tests in CI" (#7029) (by Ailing)
  • [build] Remove libexport_core.so dependency for Android App CI (#6997) (by Zhanlue Yang)
  • [opengl] Enable more gles tests in CI (#7010) (by Ailing)
  • [aot] C-API opengl runtime interop (#7014) (by damnkk)
  • [misc] Add macro to control amdgpu-related header file (#7021) (by Zeyu Li)
  • [bug] Fix device memory allocation for numpy array on CUDA backend (#7008) (by Zhanlue Yang)
  • [ci] Try enabling MSVC and check build times (#6905) (by Bob Cao)
  • [gfx] Update Device API: Splitting ResourceBinder into seperate Shade… (#7020) (by Proton)
  • [gfx] Revert "Update Device API: Splitting ResourceBinder into sepera… (#7019) (by Proton)
  • [amdgpu] Update amdgpu device to new API (#7018) (by Bob Cao)
  • [perf] Fix fill ndarray size problem. (#6992) (by Haidong Lan)
  • [cuda] Fix LLVM15 rsqrt perf regression (#7012) (by Haidong Lan)
  • [gfx] Update Device API: Splitting ResourceBinder into seperate ShaderResourceSet & RasterResources (#6954) (by Bob Cao)
  • [opt] Add ImmediateIRModifier to provide amortized constant-time replace_usages_with() (#7001) (by Yi Xu)
  • [amdgpu] Part0 add render hardware interface (#6464) (by Zeyu Li)
  • [Error] Do not show warning when the offline cache path does not exist (#7005) (by PGZXB)
  • [Lang] [spirv] Support dynamic indexing in spirv (#6990) (by Yi Xu)
  • [misc] Remove unnecessary CompileConfig::lazy_compilation (#7009) (by PGZXB)
  • [ci] Add C++ tests on AMDGPU RHI (#6597) (by Zeyu Li)
  • [ci] Update taichi-release-tests branch (disable QuanTaichi GOL) (#7011) (by Proton)
  • [amdgpu] Part3 update runtime module (#6486) (by Zeyu Li)
  • [opengl] Fix tests running both on opengl and vulkan (#7006) (by Ailing)
  • [ir] Record the return types to a StructType (#6995) (by Lin Jiang)
  • [lang] Get the CHI-IR struct type in python (#6994) (by Lin Jiang)
  • [ir] Change type maps to unordered maps and add mutexes (#7000) (by Lin Jiang)
  • [ir] Add struct type to CHI-IR (#6982) (by Lin Jiang)
  • [misc] Add repography activity stats (#6991) (by Proton)
  • [aot] Enable validation layers for C-API tests (#6893) (by Zhanlue Yang)
  • [opengl] Add ti.gles arch and enable tests (#6988) (by Ailing)
  • [Lang] [metal] Support dynamic indexing in metal (#6985) (by Yi Xu)
  • [opengl] Reset opengl context when taichi program resets (#6987) (by Ailing)
  • [Lang] Support LU sparse solver on CUDA backend (#6967) (by pengyu)
  • [misc] Keeping up with new python-wheel implementation (#6986) (by Proton)
  • [aot] Recover AOT CI script (#6970) (by PENGUINLIONG)
  • [lang] Migrate TensorType expansion for svd from Python code to Frontend IR (#6972) (by Zhanlue Yang)
  • [misc] Adding XCode project support (#6976) (by Bob Cao)
  • [bug] Fix taichi_ngp starting from ti example (#6973) (by Ailing)
  • [ci] Revert "Fix missing c_api.so in linux nightly" (#6974) (by Ailing)
  • [ci] Build: auto install vulkan on Linux (#6969) (by Proton)
  • [ci] Auto setup miniforge3 env when build (#6966) (by Proton)
  • [Lang] Fix struct type problem (#6949) (by Zhao Liang)
  • [aot] C-API breaking changes! (#6955) (by PENGUINLIONG)
  • [lang] Fix scalarization for PrintStmt (#6945) (by Zhanlue Yang)
  • [bug] Allow StructType as type hint to ti.func (#6964) (by Yi Xu)
  • [refactor] Remove legacy code for dynamic index (#6961) (by Yi Xu)
  • [aot] Fix rwtexture with template_args (#6960) (by Ailing)
  • [ci] Fix missing c_api.so in linux nightly (#6962) (by Ailing)
  • [lang] Migrate TensorType expansion for SNode indices from Python to Frontend IR (#6934) (by Zhanlue Yang)
  • [doc] New FAQ added (#6963) (by Olinaaaloompa)
  • [ci] Sync CI cache script & workflow (#6959) (by Proton)
  • [ci] Update release test branch, reduce running time (#6944) (by Proton)
  • [ci] Remove redundant tests (#6947) (by Proton)
  • [bug] Fix recompilation of filling a matrix field with the same matrix (#6951) (by Yi Xu)
  • [aot] Fixed C-API behavior tests (#6939) (by PENGUINLIONG)
  • [refactor] Remove _PyScopeMatrixImpl (#6943) (by Yi Xu)
  • [aot] Fix validation warning: OpImageFetch should operate on OpImage instead of OpSampledImage (#6925) (by Zhanlue Yang)
  • [CLI] Add "ti cache clean" command to clean the offline cache files manually (#6937) (by PGZXB)
  • [Doc] Minor doc update (#6952) (by Zhao Liang)
  • [ci] Fix forgotten build script paths (#6941) (by Proton)
  • [opt] Add pass eliminate_immutable_local_vars (#6926) (by Yi Xu)
  • [ci] Fix pre-commit errors (#6940) (by Proton)
  • [doc] Editorial updates (#6935) (by Olinaaaloompa)
  • [ci] Workflow Rewrite: Building on Linux (#6848) (by Proton)
  • [refactor] Remove _IntermediateMatrix and _MatrixFieldElement (#6932) (by Yi Xu)
  • [aot] C_API behavior test (#6904) (by damnkk)
  • [lang] Fix matrix type inference and remove _MatrixEntriesInitializer (#6928) (by Yi Xu)
  • [lang] Reorder sparse matrix before solving (#6886) (by pengyu)
  • [Doc] Glossary (#6101) (by Olinaaaloompa)
  • [aot] Refactor C-API error tests (#6890) (by Zhanlue Yang)
  • [doc] Update layout.md (Fields) (#6868) (by Gabriel Vainer)
  • [Doc] Update dac (#6875) (by Gabriel Vainer)
  • [lang] Support 'len' with Matrix-typed operands (#6923) (by Zhanlue Yang)
  • [doc] Update sparse.md (#6908) (by Gabriel Vainer)
  • [doc] Update performance.md (#6911) (by Gabriel Vainer)
  • [doc] Update debugging.md (#6909) (by Gabriel Vainer)
  • [doc] Update profiler.md (#6910) (by Gabriel Vainer)
  • [bug] Add GetElementExpression to offline cache key (#6918) (by PGZXB)
  • [ci] Reenable AMDGPU CI, disable OpenGL tests in AMDGPU task (#6887) (by Proton)
  • [lang] Fix accidental changes during matrix refactor (#6914) (by Yi Xu)
  • [example] Add circle-packing example (#6870) (by Zhao Liang)
  • [Doc] Update faq.md (#6921) (by Zhao Liang)
  • [misc] Show suggestion when locking metadata.lock fails (#6919) (by PGZXB)
  • [doc] New FAQs (#6055) (by Olinaaaloompa)
  • [example] Add poission disk sampling example (#6852) (by Zhao Liang)
  • [vulkan] Improve Vulkan RHI impl with lower overhead internal implementations (#6912) (by Bob Cao)
  • [doc] Link to LLVM 15 built for Visual Studio 2022 (#6916) (by PENGUINLIONG)
  • [lang] Fix issue of IfExpr with TensorTyped operands (#6897) (by Zhanlue Yang)
  • [doc] Update hello_world.md (#6889) (by Gabriel Vainer)
  • [IR] Allow a maximum of 12 SNode indices (#6901) (by Dunfan Lu)
  • [doc] Update odop.md (#6874) (by Gabriel Vainer)
  • [doc] Update external.md (#6869) (by Gabriel Vainer)
  • [Doc] Update dataclass.md (#6876) (by Gabriel Vainer)
  • [doc] Update cloth_simulation.md (#6898) (by Vissidarte-Herman)
  • [example] Update marching squares example (#6851) (by Zhao Liang)
  • [Lang] Add warning message when converting dynamic snode to numpy (#6853) (by Zhao Liang)
  • [Lang] Deprecate sourceinspect dependency (#6894) (by Zhao Liang)
  • [aot] Added C-API behavior tests (#6871) (by damnkk)
  • [aot] Gather satellite repo URLs (#6860) (by PENGUINLIONG)
  • [refactor] Remove _TiScopeMatrixImpl (#6892) (by Yi Xu)
  • [ci] Python test minor fixes (#6891) (by Proton)
  • [ir] Add ir_traits namespace to use less dynamic casts & Run CFG only ever once (#6812) (by Bob Cao)
  • [Lang] Warn users if ndarray size is out of int32 boundary (#6846) (by Yi Xu)
  • [build] Enable strip for libtaichi_c_api.so with Release Build (#6845) (by Zhanlue Yang)
  • [Lang] Remove the real_matrix switch (#6885) (by Yi Xu)
  • [build] Turn on function level linking for taichi_c_api (#6840) (by Zhanlue Yang)
  • [test] Remove tests with real_matrix=True and real_matrix_scalarize=True (#6873) (by Yi Xu)
  • [misc] Revert back to master after #6843 merged (#6883) (by Bob Cao)
  • [vulkan] Cleanup spdlog related logging from Vulkan RHI (#6843) (by Bob Cao)
  • [ci] Temporarily disable AMDGPU CI (#6872) (by Proton)
  • [Lang] Enable real_matrix and real_matrix_scalarize by default (#6801) (by Zhanlue Yang)
  • [bug] MatrixType bug fix: Fix error with static-grouped-ndrange (#6839) (by Zhanlue Yang)
  • [example] Fix jacobian example (#6849) (by Mingrui Zhang)
  • [bug] Fix flaky mass_spring_game_ggui.py on Mac M1 by setting up default values for VulkanCapabilities (#6850) (by Zhanlue Yang)
  • [example] Solve implicit fem using sparsee solver (#6827) (by pengyu)
  • [build] Migrate cmake targets from OBJECT to STATIC for libtaichi_c_api.so (#6831) (by Zhanlue Yang)
  • [Bug] Fix getting 64-bit data from ndarray in Python scope (#6836) (by Yi Xu)
  • [test] Avoid constant folding in overflow tests (#6835) (by Ailing)
  • [aot] Added C-API behavior test (#6837) (by damnkk)
  • [bug] Matrix refactor bug fix: Fix cross scope matrix operations (#6822) (by Zhanlue Yang)
  • [build] Refactored and removed RuntimeCUDA and RuntimeCUDAInjector (#6830) (by Zhanlue Yang)
  • [bug] Matrix refactor bug fix: Fix logical binary operations with TensorTyped operands (#6817) (by Zhanlue Yang)
  • [example] Add order-independent transparency example (#6829) (by Lin Jiang)
  • [opt] Re-enable constant folding when debug=True (#6824) (by Ailing)
  • [Bug] Avoid overwriting global tmp with dynamic_index=True (#6820) (by Yi Xu)
  • [bug] Matrix refactor bug fix: Fix restrictions on BinaryOp/TernaryOp operands' broadcasting (#6805) (by Zhanlue Yang)
  • [aot] C-API Device capability improvements (#6773) (by PENGUINLIONG)
  • [misc] Headers dependency cleanup from RHI (#6699) (by Bob Cao)
  • [ci] Revert "Temporarily disable desktop headless tests (#6811)" (#6816) (by Proton)
  • [misc] Bump version to v1.4.0 (#6804) (by PENGUINLIONG)
  • [ci] Add AMDGPU relected ci (#6743) (by Zeyu Li)
  • [test] Remove unnecessary duplicated python runtime test runs (#6808) (by Ailing)
  • [Lang] Raise an error for the semantic change of transpose() (#6813) (by Yi Xu)
  • [refactor] Remove unnecessary checks in program (#6802) (by Ailing)
  • [vulkan] Support texture type args in aot add_kernel (#6796) (by Ailing)
  • [ci] Temporarily disable desktop headless tests (#6811) (by Proton)
  • [bug] Fix name collision in ti.dataclass (#6737) (by Yi Xu)
  • [bug] MatrixType bug fix: Add additional restrictions for unpacking a Matrix (#6795) (by Zhanlue Yang)
  • [doc] Update docstring for grad replaced (#6800) (by Mingrui Zhang)
  • [build] Add MSBuild option to setup.py (#6724) (by Bob Cao)
  • [Lang] [type] Add bool type in python as an alias to i32 (#6742) (by daylily)
  • [lang] Use less gpu memory when building sparse matrix (#6781) (by pengyu)
  • [example] Add cuda options for sparse matrix examples (#6785) (by pengyu)
  • [misc] Remove usage of deprecated num_channels/channel_format type hint in rw_texture in codebase (#6791) (by Ailing)
  • [bug] MatrixType bug fix: Fix error with BLS (#6664) (by Zhanlue Yang)
  • [vulkan] Support rw_texture in aot add_kernel (#6789) (by Ailing)
  • [bug] MatrixType bug fix: Fix error with quant (#6776) (by Yi Xu)
  • [bug] MatrixType bug fix: Fix test_ad_gdar_diffmpm (#6786) (by Yi Xu)
  • [vulkan] Deprecate num_channels and channel_format args in rw_texture type annotation (#6782) (by Ailing)
  • [misc] Remove the default potential_bug label on bug report issues (#6784) (by Ailing)
  • [bug] MatrixType bug fix: Fix error with texture (#6775) (by Yi Xu)
  • [vulkan] Make sure kernel recompiles when texture dtype changes (#6774) (by Ailing)
  • [aot] Clean up exported symbols for libtaichi_c_api.so (#6140) (by Zhanlue Yang)
  • [Misc] Refactored flattend_values() to avoid potential conflicts in flattened statements (#6749) (by Zhanlue Yang)
  • [aot] Warn the user about out-of-range access in C++ wrapper (#6492) (by PENGUINLIONG)
  • [build] Initial distributed compiling support (#6762) (by Proton)
  • [aot] Revert C-API Device capability improvements (#6772) (by PENGUINLIONG)
  • [aot] C-API Device capability improvements (#6702) (by PENGUINLIONG)
  • [aot] C-API to get available archs (#6766) (by PENGUINLIONG)
  • [doc] Update sparse matrix document (#6719) (by pengyu)
  • [autodiff] Separate non-linear operators to an individual class (#6700) (by Mingrui Zhang)
  • [bug] Fix dereferencing nullptr (#6763) (by Yi Xu)
  • [Doc] Update the documentation about Dynamic SNode (#6752) (by Lin Jiang)
  • [doc] Update dev install about clang version (#6759) (by Ailing)
  • [build] Improve TI_WITH_CUDA guards for CUDA related test cases (#6698) (by Zhanlue Yang)
  • [Lang] Add deprecation warning for the removal of the packed switch (#6753) (by Yi Xu)
  • [lang] Improve sparse matrix building on GPU (#6748) (by pengyu)
  • [aot] JSON serde (#6754) (by PENGUINLIONG)
  • [bug] MatrixType bug fix: Fix error with to_numpy() and from_numpy() (#6726) (by Zhanlue Yang)
  • [Doc] Stop mentioning packed mode (#6755) (by Yi Xu)
  • [lang] Get the length of dynamic SNode by x.length() (#6750) (by Lin Jiang)
  • [llvm] Support nested struct with matrix return value on real function (#6734) (by Lin Jiang)
  • [Metal] [error] Raise deprecate warning and error when using sparse snodes on metal (#6739) (by Lin Jiang)
  • [build] Integrate backward_cpp to test targets for enabling C++ stack trace (#6697) (by Zhanlue Yang)
  • [aot] Load AOT module from memory (#6692) (#6714) (by PENGUINLIONG)
  • [ci] Add dockerfile.ubuntu-18.04.amdgpu (#6736) (by Zeyu Li)
  • [doc] Update LLVM10 -> LLVM15 in installation guide (#6747) (by Zhanlue Yang)
  • [misc] Fix warnings of taichi examples (#6740) (by PGZXB)
  • [example] Ti-example: instant ngp renderer (#6673) (by Youtian Lin)
  • [build] Use a separate prebuilt llvm15 binary for manylinux environment (#6732) (by Ailing)
taichi - v1.3.0

Published by github-actions[bot] almost 2 years ago

Deprecation Notice

  • Using sparse data structures on the Metal backend is now deprecated. The support for Dynamic SNode has been removed in v1.3.0, and the support for Pointer/Bitmasked SNode will be removed in v1.4.0.
  • The packed switch in ti.init() is now deprecated and will be removed in v1.4.0. See the feature introduction below for details.
  • ti.Matrix.rotation2d() is now deprecated and will be removed in v1.4.0. Use ti.math.rotation2d() instead.
  • To clearly distinguish vectors from matrices, transpose() on a vector is no longer allowed. If you want something like a @ b.transpose(), write a.outer_product(b) instead.
  • Ndarray: The arguments of ndarray type annotation element_dim, element_shape and field_dim will be deprecated in v1.4.0. The field_dim is renamed to ndim to make it more intuitive. element_dim and element_shape will be replaced by passing a matrix type into dtype argument. For example, the ti.types.ndarray(element_dim=2, element_shape=(3,3)) will be replaced by ti.types.ndarray(dtype=ti.matrix(3,3)).

New features

Dynamic SNode

To support variable-length fields, Taichi provides dynamic SNodes.
You can now use the dynamic SNode on fields of different data types, even struct fields and matrix fields.
You can use x[i].append(...) to append an element, use x[i].length() to get the length, and use x[i].deactivate() to clear the list as shown in the following code snippet.

pair = ti.types.struct(a=ti.i16, b=ti.i64)
pair_field = pair.field()

block = ti.root.dense(ti.i, 4)
pixel = block.dynamic(ti.j, 100, chunk_size=4)
pixel.place(pair_field)
l = ti.field(ti.i32)
ti.root.dense(ti.i, 5).place(l)

@ti.kernel
def dynamic_pair():
    for i in range(4):
        pair_field[i].deactivate()
        for j in range(i * i):
            pair_field[i].append(pair(i, j + 1))
        # pair_field = [[],
        #              [(1, 1)],
        #              [(2, 1), (2, 2), (2, 3), (2, 4)],
        #              [(3, 1), (3, 2), ... , (3, 8), (3, 9)]]
        l[i] = pair_field[i].length()  # l = [0, 1, 4, 9]

Packed Mode

Packed mode was introduced in v0.8.0 to allow users to trade runtime performance for memory usage. In v1.3.0, after the elimination of runtime overhead in common cases, packed mode has become the default mode. There's no longer any automatic padding behavior behind the scenes, so users can use fields and SNodes without surprise.

Sparse Matrix

We introduce the experimental sparse matrix and sparse solver on the CUDA backend. The API of using is the same as CPU backend. Currently, only the f32 data type and LLT linear solver are supported on CUDA. You can only use ti.ndarray to compute SpMV and linear solver operation. Float64 data type and other linear solvers are under implementation.

Improvements

Python Frontend

  • Matrix slicing now supports augmented assign (e.g. +=) besides assign.

Taichi Examples

  1. Our user https://github.com/Linyou contributed an excellent example on instant ngp renderer PR #6673. Run taichi_ngp to check it out!

[Developers only] LLVM15 upgrade

Starting from v1.3.0, Taichi has upgraded its LLVM dependency to version 15.0.0. If you're interested in contributing or simply building Taichi from source, please follow our installation doc for developers.
Note this change has no impact on Taichi users.

Highlights

  • Documentation
    • Update the documentation about Dynamic SNode (#6752) (by Lin Jiang)
    • Stop mentioning packed mode (#6755) (by Yi Xu)
  • Language and syntax
    • Add deprecation warning for the removal of the packed switch (#6753) (by Yi Xu)
  • Metal backend
    • Raise deprecate warning and error when using sparse snodes on metal (#6739) (by Lin Jiang)

Full changelog

  • [aot] Revert C-API Device capability improvements (#6772) (by PENGUINLIONG)
  • [aot] C-API Device capability improvements (#6702) (by PENGUINLIONG)
  • [aot] C-API to get available archs (#6766) (by PENGUINLIONG)
  • [doc] Update sparse matrix document (#6719) (by pengyu)
  • [autodiff] Separate non-linear operators to an individual class (#6700) (by Mingrui Zhang)
  • [bug] Fix dereferencing nullptr (#6763) (by Yi Xu)
  • [Doc] Update the documentation about Dynamic SNode (#6752) (by Lin Jiang)
  • [doc] Update dev install about clang version (#6759) (by Ailing)
  • [build] Improve TI_WITH_CUDA guards for CUDA related test cases (#6698) (by Zhanlue Yang)
  • [Lang] Add deprecation warning for the removal of the packed switch (#6753) (by Yi Xu)
  • [lang] Improve sparse matrix building on GPU (#6748) (by pengyu)
  • [aot] JSON serde (#6754) (by PENGUINLIONG)
  • [bug] MatrixType bug fix: Fix error with to_numpy() and from_numpy() (#6726) (by Zhanlue Yang)
  • [Doc] Stop mentioning packed mode (#6755) (by Yi Xu)
  • [lang] Get the length of dynamic SNode by x.length() (#6750) (by Lin Jiang)
  • [llvm] Support nested struct with matrix return value on real function (#6734) (by Lin Jiang)
  • [Metal] [error] Raise deprecate warning and error when using sparse snodes on metal (#6739) (by Lin Jiang)
  • [build] Integrate backward_cpp to test targets for enabling C++ stack trace (#6697) (by Zhanlue Yang)
  • [aot] Load AOT module from memory (#6692) (#6714) (by PENGUINLIONG)
  • [ci] Add dockerfile.ubuntu-18.04.amdgpu (#6736) (by Zeyu Li)
  • [doc] Update LLVM10 -> LLVM15 in installation guide (#6747) (by Zhanlue Yang)
  • [misc] Fix warnings of taichi examples (#6740) (by PGZXB)
  • [example] Ti-example: instant ngp renderer (#6673) (by Youtian Lin)
  • [build] Use a separate prebuilt llvm15 binary for manylinux environment (#6732) (by Ailing)
taichi - v1.2.2

Published by github-actions[bot] almost 2 years ago

Molten-vk version is downgraded to v1.1.10 to fix a few GGUI issues.

Full changelog:

  • [build] Downgrade molten-vk version to v1.1.10 (#6564) (by Zhanlue Yang)
taichi - v1.2.1

Published by github-actions[bot] almost 2 years ago

This is a bug fix release for v1.2.0.

Full changelog:

  • [mesh] Fix MeshTaichi warnings in CUDA backend (#6369) (by Chang Yu)
  • [Bug] Fix cache_loop_invariant_global_vars pass (#6462) (by Lin Jiang)
taichi - v1.2.0

Published by github-actions[bot] almost 2 years ago

Starting from the v1.2.0 release, Taichi follows semantic versioning where regular releases cutting from master branch bumps MINOR version and PATCH version is only bumped when cherry-picking critial bug fixes.

Deprecation Notice

Indexing multi-dimensional ti.ndrange() with a single loop index will be disallowed in future releases.

Highlights

New features

Offline Cache

We introduced the offline cache on CPU and CUDA backends in v1.1.0. In this release, we support this feature on other backends, including Vulkan, OpenGL, and Metal.

  • If your code behaves abnormally, disable offline cache by setting the environment variable TI_OFFLINE_CACHE=0 or offline_cache=False in the ti.init() method call and file an issue with us on Taichi's GitHub repo.
  • See Offline cache for more information.

GDAR (Global Data Access Rule)

A checker is provided for detecting potential violations of global data access rules.

  1. The checker only works in debug mode. To enable it, set debug=True when calling ti.init().
  2. Set validation=True when using ti.ad.Tape() to validate the kernels captured by ti.ad.Tape().
    If a violation occurs, the checker pinpoints the line of code breaking the rules.

For example:

import taichi as ti
ti.init(debug=True)

N = 5
x = ti.field(dtype=ti.f32, shape=N, needs_grad=True)
loss = ti.field(dtype=ti.f32, shape=(), needs_grad=True)
b = ti.field(dtype=ti.f32, shape=(), needs_grad=True)

@ti.kernel
def func_1():
    for i in range(N):
        loss[None] += x[i] * b[None]

@ti.kernel
def func_2():
    b[None] += 100

b[None] = 10
with ti.ad.Tape(loss, validation=True):
    func_1()
    func_2()

"""
taichi.lang.exception.TaichiAssertionError:
(kernel=func_2_c78_0) Breaks the global data access rule. Snode S10 is overwritten unexpectedly.
File "across_kernel.py", line 16, in func_2:
    b[None] += 100
    ^^^^^^^^^^^^^^
"""

Improvements

Performance

Improved Vulkan performance with loops (#6072) (by Lin Jiang)

Python Frontend

  • PrefixSumExecutor is added to improve the performance of prefix-sum operations. The legacy prefix-sum function allocates auxiliary gpu buffers at every function call, which causes an obvious performance problem. The new PrefixSumExecutor is able to avoid allocating buffers again and again. For arrays with the same length, the PrefixSumExecutor only needs to be initialized once, then it is able to perform any number of times prefix-sum operations without redundant field allocations. The prefix-sum operation is only supported on CUDA backend currently. (#6132) (by Yu Zhang)

    Usage:

    N = 100
    arr0 = ti.field(dtype, N)
    arr1 = ti.field(dtype, N)
    arr2 = ti.field(dtype, N)
    arr3 = ti.field(dtype, N)
    arr4 = ti.field(dtype, N)
    
    # initialize arr0, arr1, arr2, arr3, arr4, ...
    # ...
    
    # Performing an inclusive in-place's parallel prefix sum,
    # only one executor is needed for a specified sorting length.
    executor = ti.algorithms.PrefixSumExecutor(N)
    executor.run(arr0)
    executor.run(arr1)
    executor.run(arr2)
    executor.run(arr3)
    executor.run(arr4)
    
  • Runtime integer overflow detection on addition, subtraction, multiplication and shift left operators on Vulkan, CPU and CUDA backends is now available when debug mode is on. To use overflow detection on Vulkan backend, you need to enable printing, and the overflow detection of 64-bit multiplication on Vulkan backend requires NVIDIA driver 510 or higher. (#6178) (#6279) (by Lin Jiang)

    For the following program:

    import taichi as ti
    
    ti.init(debug=True)
    
    @ti.kernel
    def add(a: ti.u64, b: ti.u64)->ti.u64:
        return a + b
    
    add(2 ** 63, 2 ** 63)
      The following warning is printed at runtime:
    Addition overflow detected in File "/home/lin/test/overflow.py", line 7, in add:
        return a + b
               ^^^^^
    
  • Printing is now supported on Vulkan backend on Unix/Windows platforms. To enable printing on vulkan backend, follow instructions at https://docs.taichi-lang.org/docs/master/debugging#applicable-backends (#6075) (by Ailing)

GGUI

Taichi Examples

Three new examples from community contributors are also merged in this release. They include:

  • Animating the fundamental solution of a Laplacian equation, (#6249) (by @bismarckkk)
  • Animating the Kerman vortex street using LBM, (#6249) (by @hietwl)
  • Animating the two streams of instability (#6249) (by JiaoLuhuai)

You can view these examples by running ti example in terminal and select the corresponding index.

Important bug fixes

  • "ti.data_oriented" class instance now correctly releases its allocated memory upon garbage collection. (#6256) (by Zhanlue Yang)
  • "ti.fields" can now be correctly indexed using non-i32 typed indices. (#6276) (by Zhanlue Yang)
  • "ti.select" and "ti.ifte" can now be printed correctly in Taichi Kernels. (#6297) (by Zhanlue Yang)
  • Before this release, setting u64 arguments with numbers greater than 2^63 raises error, and u64 return values are treated as i64 in Python (integers greater than 2^63 are returned as negative numbers). This release fixed those two bugs. (#6267) (#6364) (by Lin Jiang)
  • Taichi now raises an error when the number of the loop variables does not match the dimension of the ndrange for loop instead of malfunctioning. (#6360) (by Lin Jiang)
  • calling ti.append with vector/matrix now throws more proper error message. (#6322) (by Ailing)
  • Division on unsigned integers now works properly on LLVM backends. (#6128) (by Yi Xu)
  • Operator ">>=" now works properly. (#6153) (by Yi Xu)
  • Numpy int is now allowed for SNode shape setting. (#6211) (by Yi Xu)
  • Dimension check for GlobalPtrStmt is now aware of whether it is a cell access. (#6275) (by Yi Xu)
  • Before this release, Taichi autodiff may fail in cases where the condition of an if statement depends on the index of a outer for-loop. The bug has been fixed in this release. (#6207) (by Mingrui Zhang)

Full changelog:

  • [Error] Deprecate ndrange with number of the loop variables != the dimension of the ndrange (#6422) (by Lin Jiang)
  • Adjust aot_demo.sh (by jim19930609)
  • [error] Warn Linux users about manylinux2014 build on startup i(#6416) (by Proton)
  • [misc] Bug fix (by jim19930609)
  • [misc] Bump version (by jim19930609)
  • [vulkan] [bug] Stop using the buffer device address feature on macOS (#6415) (by Yi Xu)
  • [Lang] [bug] Allow filling a field with Expr (#6391) (by Yi Xu)
  • [misc] Rc v1.2.0 cherry-pick PR number 2 (#6384) (by Zhanlue Yang)
  • [misc] Revert PR 6360 (#6386) (by Zhanlue Yang)
  • [misc] Rc v1.2.0 c1 (#6380) (by Zhanlue Yang)
  • [bug] Fix potential bug in #6362 (#6363) (#6371) (by Zhanlue Yang)
  • [example] Add example "laplace equation" (#6302) (by 猫猫子Official)
  • [ci] Android Demo: leave Docker containers intact for debugging (#6357) (by Proton)
  • [autodiff] Skip gradient kernel compilation for validation kernel (#6356) (by Mingrui Zhang)
  • [autodiff] Move autodiff gdar checker to release (#6355) (by Mingrui Zhang)
  • [aot] Removed constraint on same-allocation copy (#6354) (by PENGUINLIONG)
  • [ci] Add new performance monitoring (#6349) (by Proton)
  • [dx12] Only use llvm to compile dx12. (#6339) (by Xiang Li)
  • [opengl] Fix with_opengl when TI_WITH_OPENGL is off (#6353) (by Ailing)
  • [Doc] Add instructions about running clang-tidy checks locally (by Ailing Zhang)
  • [build] Enable readability-redundant-member-init in clang-tidy check (by Ailing Zhang)
  • [build] Enable TI_WITH_VULKAN and TI_WITH_OPENGL for clang-tidy checks (by Ailing Zhang)
  • [build] Enable a few modernize checks in clang-tidy (by Ailing Zhang)
  • [autodiff] Recover kernel autodiff mode after validation (#6265) (by Mingrui Zhang)
  • [test] Adjust rtol for sparse_linear_solver tests (#6352) (by Ailing)
  • [lang] MatrixType bug fix: Fix array indexing with MatrixType-index (#6323) (by Zhanlue Yang)
  • [Lang] MatrixNdarray refactor part13: Add scalarization for TernaryOpStmt (#6314) (by Zhanlue Yang)
  • [Lang] MatrixNdarray refactor part12: Add scalarization for AtomicOpStmt (#6312) (by Zhanlue Yang)
  • [build] Enable a few modernize checks in clang-tidy (by Ailing Zhang)
  • [build] Enable google-explicit-constructor check in clang-tidy (by Ailing Zhang)
  • [build] Enable google-build-explicit-make-pair check in clang-tidy (by Ailing Zhang)
  • [build] Enable a few bugprone related rules in clang-tidy (by Ailing Zhang)
  • [build] Enable modernize-use-override in clang-tidy (by Ailing Zhang)
  • [ci] Use .clang-tidy for check_static_analyzer job (by Ailing Zhang)
  • [mesh] Support arm64 backend for MeshTaichi (#6329) (by Chang Yu)
  • [lang] Throw proper error message if calling ti.append with vector/matrix (#6322) (by Ailing)
  • [aot] Fixed buffer device address import (#6326) (by PENGUINLIONG)
  • [aot] Fixed export of get_instance_proc_addr (#6324) (by PENGUINLIONG)
  • [build] Allow building test when LLVM is off (#6327) (by Ailing)
  • [bug] Fix generating LLVM AOT module for the second time failed (#6311) (by PGZXB)
  • [aot] Per-parameter documentation in C-API header (#6317) (by PENGUINLIONG)
  • [ci] Revert "Add end-to-end CI tests for meshtaichi (#6321)" (#6325) (by Proton)
  • [ci] Add end-to-end CI tests for meshtaichi (#6321) (by yixu)
  • [doc] Update the document about offline cache (#6313) (by PGZXB)
  • [aot] Include taichi_cpu.h in taich.h (#6315) (by Zhanlue Yang)
  • [Vulkan] [bug] Change the format string of 64bit unsigned integer type from %llu to %lu (#6308) (by Lin Jiang)
  • [mesh] Refactor MeshTaichi API (#6306) (by Chang Yu)
  • [lang] MatrixType bug fix: Allow dynamic_index=True when real_matrix_scalarize=True (#6304) (by Yi Xu)
  • [lang] MatrixType bug fix: Enable irpass::cfg_optimization if real_matrix_scalarize is on (#6300) (by Zhanlue Yang)
  • [metal] Enable offline cache by default on Metal (#6307) (by PGZXB)
  • [Vulkan] Add overflow detection on vulkan when debug=True (#6279) (by Lin Jiang)
  • [aot] Inline documentations (#6301) (by PENGUINLIONG)
  • [aot] Support exporting interop info for TiMemory on Cpu/Cuda backends (#6242) (by Zhanlue Yang)
  • [lang] MatrixType bug fix: Avoid checks for legacy Matrix-class when real_matrix is on (#6292) (by Zhanlue Yang)
  • [aot] Support setting vector/matrix argument in C++ wrapper of C-API (#6298) (by Ailing)
  • [lang] MatrixType bug fix: Fix MatrixType validations in build_call_if_is_type() (#6294) (by Zhanlue Yang)
  • [bug] Fix asserting failed when registering kernels with same name on Metal (#6271) (by PGZXB)
  • [ci] Add more release tests (#5839) (by Proton)
  • [lang] MatrixType bug fix: Allow indexing a matrix r-value (#6291) (by Yi Xu)
  • [bug] Fix duplicate runs with 'run_tests.py --cpp -k' when selecting AOT tests (#6296) (by Zhanlue Yang)
  • [bug] Fix segmentation fault with TextureOpStmt ir_printer (#6297) (by Zhanlue Yang)
  • [ci] Add taichi-aot-demo headless demos (#6280) (by Proton)
  • [bug] Serialize missing fields of metal::TaichiKernelAttributes and metal::KernelAttributes (#6270) (by PGZXB)
  • [metal] Implement offline cache cleaning on metal (#6272) (by PGZXB)
  • [aot] Reorganized C-API headers (#6199) (by PENGUINLIONG)
  • [lang] [bug] Fix setting integer arguments within u64 range but greater than i64 range (#6267) (by Lin Jiang)
  • [autodiff] Skip gdar checking for user defined grad kernel (#6273) (by Mingrui Zhang)
  • [bug] Fix AotModuleBuilder::add_compiled_kernel (#6287) (by PGZXB)
  • [Bug] [lang] Make dimension check for GlobalPtrStmt aware of whether it is a cell access (#6275) (by Yi Xu)
  • [refactor] Move setting visible device to vulkan instance initialization (by Ailing Zhang)
  • [bug] Add unit test to detect memory leak from data_oriented classes (#6278) (by Zhanlue Yang)
  • [aot] Ship runtime *.bc files with C-API for LLVM AOT (#6285) (by Zhanlue Yang)
  • [bug] Convert non-i32 type indices to i32 for GlobalPtrStmt (#6276) (by Zhanlue Yang)
  • [Doc] Renamed syntax.md to kernel_function.md, plus miscellaneous edits (#6277) (by Vissidarte-Herman)
  • [lang] Fixed validation scope (#6262) (by PENGUINLIONG)
  • [bug] Prevent ti.kernel from directly caching the passed-in arguments to avoid memory leak (#6256) (by Zhanlue Yang)
  • [autodiff] Add demote atomics before gdar checker (#6266) (by Mingrui Zhang)
  • [autodiff] Add grad check feature and related test (#6245) (by PhrygianGates)
  • [lang] Fixed contraction cast (#6255) (by PENGUINLIONG)
  • [Example] Add karman vortex street example (#6249) (by Zhao Liang)
  • [ci] Lift GitHub CI timeout (#6260) (by Proton)
  • [metal] Support offline cache on metal (#6227) (by PGZXB)
  • [dx12] Add DirectX-Headers as a submodule (#6259) (by Xiang Li)
  • [bug] Fix link error with TI_WITH_OPENGL:BOOL=ON but TI_WITH_VULKAN:BOOL=OFF (#6257) (by PGZXB)
  • [dx12] Disable DX12 for cpu only test. (#6253) (by Xiang Li)
  • [Lang] MatrixNdarray refactor part11: Fuse ExternalPtrStmt and PtrOffsetStmt (#6189) (by Zhanlue Yang)
  • [Doc] Rename index.md to hello_world.md (#6244) (by Vissidarte-Herman)
  • [Doc] Update syntax.md (#6236) (by Zhao Liang)
  • [spirv] Generate OpBitFieldUExtract for BitExtractStmt (#6208) (by Yi Xu)
  • [Bug] [lang] Allow numpy int as snode dimension (#6211) (by Yi Xu)
  • [doc] Update document about building and running Taichi C++ tests (#6228) (by PGZXB)
  • [misc] Disable the offline cache if printing ir is enabled (#6234) (by PGZXB)
  • [vulkan] [opengl] Enable offline cache by default on Vulkan and OpenGL (#6233) (by PGZXB)
  • [Doc] Update math_module.md (#6235) (by Zhao Liang)
  • [Doc] Update debugging.md (#6238) (by Zhao Liang)
  • [dx12] Add ti.dx12. (#6174) (by Xiang Li)
  • [lang] Set ret_type for AtomicOpStmt (#6213) (by Ailing)
  • [Doc] Update global settings (#6201) (by Olinaaaloompa)
  • [doc] Editorial updates (#6216) (by Vissidarte-Herman)
  • [Doc] Update hello world (#6191) (by Olinaaaloompa)
  • [Doc] Update math module (#6203) (by Olinaaaloompa)
  • [Doc] Update profiler (#6214) (by Olinaaaloompa)
  • [autodiff] Store if condition in adstack (#6207) (by Mingrui Zhang)
  • [Doc] Update debugging.md (#6212) (by Zhao Liang)
  • [Doc] Update debugging.md (#6200) (by Zhao Liang)
  • [bug] Fixed type inference error with ExternalPtrStmt (#6210) (by Zhanlue Yang)
  • [example] Request to add my code into examples (#6185) (by JiaoLuhuai)
  • [Lang] MatrixNdarray refactor part10: Remove redundant MatrixInitStmt generated from scalarization (#6171) (by Zhanlue Yang)
  • [aot] Apply ti_get_last_error_message() for all C-API test cases (#6195) (by Zhanlue Yang)
  • [llvm] [refactor] Merge create_call and call (#6192) (by Lin Jiang)
  • [build] Support executing manually-specified cpp tests for run_tests.py (#6206) (by Zhanlue Yang)
  • [doc] Editorial updates to field.md (#6202) (by Vissidarte-Herman)
  • [Lang] MatrixNdarray refactor part9: Add scalarization for AllocaStmt (#6168) (by Zhanlue Yang)
  • [Lang] Support GPU solve with analyzePattern and factorize (#6158) (by pengyu)
  • [Lang] MatrixField refactor 9/n: Allow dynamic index of matrix field when real_matrix=True (#6194) (by Yi Xu)
  • [Doc] Fixed broken links (#6193) (by Olinaaaloompa)
  • [ir] MatrixField refactor 8/n: Rename PtrOffsetStmt to MatrixPtrStmt (#6187) (by Yi Xu)
  • [Doc] Update field.md (#6182) (by Zhao Liang)
  • [bug] Relax dependent Pillow version (#6170) (by Ailing)
  • [Doc] Update data_oriented_class.md (#6181) (by Zhao Liang)
  • [Doc] Update kernels and functions (#6176) (by Zhao Liang)
  • [Doc] Update type.md (#6180) (by Zhao Liang)
  • [Doc] Update getting started (#6175) (by Zhao Liang)
  • [llvm] MatrixField refactor 7/n: Simplify codegen for TensorType allocation and access (#6169) (by Yi Xu)
  • [LLVM] Add runtime overflow detection on LLVM-based backends (#6178) (by Lin Jiang)
  • Revert "[LLVM] Add runtime overflow detection on LLVM-based backends" (#6177) (by Ailing)
  • [dx12] Add aot for dx12. (#6099) (by Xiang Li)
  • [LLVM] Add runtime overflow detection on LLVM-based backends (#6166) (by Lin Jiang)
  • [doc] C-API documentation & generator (#5736) (by PENGUINLIONG)
  • [gui] Support for setting the initial position of GGUI window (#6156) (by Mocki)
  • [metal] Maintain a print string table per kernel (#6160) (by PGZXB)
  • [Lang] MatrixNdarray refactor part8: Add scalarization for BinaryOpStmt with TensorType-operands (#6086) (by Zhanlue Yang)
  • [Doc] Refactor debugging (#6102) (by Olinaaaloompa)
  • [doc] Updated the position of Sparse Matrix (#6167) (by Vissidarte-Herman)
  • [Doc] Refactor global settings (#6071) (by Zhao Liang)
  • [Doc] Refactor external arrays (#6065) (by Zhao Liang)
  • [Doc] Refactor simt (#6151) (by Zhao Liang)
  • [Doc] Refactor Profiler (#6142) (by Olinaaaloompa)
  • [Doc] Add doc for math module (#6145) (by Zhao Liang)
  • [aot] Fixed texture interop (#6164) (by PENGUINLIONG)
  • [misc] Remove TI_UI namespace macros (#6163) (by Lin Jiang)
  • [llvm] Add comment about the structure of the CodeGen (#6150) (by Lin Jiang)
  • [Bug] [lang] Fix augmented assign for sar (#6153) (by Yi Xu)
  • [Test] Add scipy to test GPU sparse solver (#6162) (by pengyu)
  • [bug] Fix crashing when loading old offline cache files (for gfx backends) (#6157) (by PGZXB)
  • [lang] Remove print at the end of parallel sort (#6161) (by Haidong Lan)
  • [misc] Move some offline cache utils from analysis/ to util/ (#6155) (by PGZXB)
  • [Lang] Matrix/Vector refactor: support basic matrix ops (#6077) (by Mike He)
  • [misc] Remove namespace macros (#6154) (by Lin Jiang)
  • [Doc] Update gui_system (#6152) (by Zhao Liang)
  • [aot] Track layouts for imported image & tests (#6138) (by PENGUINLIONG)
  • [ci] Fix build cache problems (#6149) (by Proton)
  • [Misc] Add prefix sum executor to avoid multiple field allocations (#6132) (by YuZhang)
  • [opt] Cache loop-invariant global vars to local vars (#6072) (by Lin Jiang)
  • [aot] Improve C++ wrapper implementation (#6146) (by PENGUINLIONG)
  • [doc] Refactored ODOP (#6143) (by Vissidarte-Herman)
  • [Lang] Support basic sparse matrix operations on GPU. (#6082) (by Jiafeng Liu)
  • [Lang] MatrixField refactor 6/n: Add tests for MatrixField scalarization (#6137) (by Yi Xu)
  • [vulkan] Fix SPV physical ptr load alignment (#6139) (by Bob Cao)
  • [bug] Let every thread has its own CompileConfig (#6124) (by Lin Jiang)
  • [refactor] Remove redundant codegen of floordiv (#6135) (by Yi Xu)
  • [doc] Miscellaneous editorial updates (#6131) (by Vissidarte-Herman)
  • Revert "[spirv] Fixed OpLoad with physical address" (#6136) (by Lin Jiang)
  • [bug] [llvm] Fix is_same_type when the suffix of a type is the prefix of the suffix of the other type (#6126) (by Lin Jiang)
  • [bug] [vulkan] Only enable non_semantic_info cap when validation layer is on (#6129) (by Ailing)
  • [Llvm] Fix codegen for div (unsigned) (#6128) (by Yi Xu)
  • [Lang] MatrixField refactor 5/n: Lower access of matrix field element into CHI IR (#6119) (by Yi Xu)
  • [Lang] Fix invalid assertion for matrix values (#6125) (by Zhanlue Yang)
  • [opengl] Fix GLES support (#6121) (by Ailing)
  • [Lang] MatrixNdarray refactor part7: Add scalarization for UnaryOpStmt with TensorType-operand (#6080) (by Zhanlue Yang)
  • [doc] Editorial updates (#6116) (by Vissidarte-Herman)
  • [misc] Allow more commits in changelog generation (#6115) (by Yi Xu)
  • [aot] Import MoltenVK (#6090) (by PENGUINLIONG)
  • [vulkan] Instruct users to install vulkan sdk if they want to use validation layer (#6098) (by Ailing)
  • [ci] Use local caches on self-hosted runners, and code refactoring. (#5846) (by Proton)
  • [misc] Bump version to v1.1.4 (#6112) (by Taichi Gardener)
  • [doc] Fixed a broken link (#6111) (by Vissidarte-Herman)
  • [doc] Update explanation on data-layout (#6110) (by Qian Bao)
  • [Doc] Move developer utilities to contribution (#6109) (by Olinaaaloompa)
  • [Doc] Added Accelerate PyTorch (#6106) (by Vissidarte-Herman)
  • [Doc] Refactor ODOP (#6013) (by Zhao Liang)
  • [opengl] Support offline cache on opengl (#6104) (by PGZXB)
  • [build] Fix building with TI_WITH_OPENGL:BOOL=OFF and TI_WITH_DX11:BOOL=ON failed (#6108) (by PGZXB)
taichi - v1.1.3

Published by github-actions[bot] about 2 years ago

Highlights:

  • Aot module
    • Added texture interfaces to C-API (#5520) (by PENGUINLIONG)
  • Bug fixes
    • Disable vkCmdWriteTimestamp with MacOS to enable tests on Vulkan (#6020) (by Zhanlue Yang)
    • Fix printing i8/u8 (#5893) (by Yi Xu)
    • Fix wrong type cast in codegen of storing quant floats (#5818) (by Yi Xu)
    • Remove wrong optimization: Float x // 1 -> x (#5672) (by Yi Xu)
  • Build system
    • Clean up Taichi core cmake (#5595) (by Bo Qiao)
  • CI/CD workflow
    • Update torch and cuda version (#6054) (by pengyu)
  • Documentation
    • Refactor field (#6006) (by Zhao Liang)
    • Update docstring of pow() (#6046) (by Yi Xu)
    • Fix spelling of numerical and nightly in README.md (#6025) (by Lauchlin)
    • Added Accelerate Python (#5940) (by Vissidarte-Herman)
    • New FAQs added (#5784) (by Olinaaaloompa)
    • Update type cast (#5831) (by Zhao Liang)
    • Update global_settings.md (#5764) (by Zhao Liang)
    • Update init docstring (#5759) (by Zhao Liang)
    • Add introduction to quantized types (#5705) (by Yi Xu)
    • Add docs for GGUI's new features (#5647) (by Mocki)
    • Add introduction to forward mode autodiff (#5680) (by Mingrui Zhang)
    • Add doc about offline cache (#5646) (by Mingming Zhang)
    • Typo in the doc. (#5652) (by dongqi shen)
  • Error messages
    • Add error when breaking/continuing a static for inside non-static if (#5755) (by Lin Jiang)
    • Do not show warning when the offline cache path does not exist (#5747) (by Lin Jiang)
  • Language and syntax
    • Sort coo to build correct csr format sparse matrix on GPU (#6050) (by pengyu)
    • MatrixNdarray refactor part6: Add scalarization for LocalLoadStmt & GlobalLoadStmt with TensorType (#6024) (by Zhanlue Yang)
    • MatrixField refactor 4/n: Disallow invalid matrix field definition (#6074) (by Yi Xu)
    • Fixes matrix-vector multiplication (#6014) (by Mike He)
    • MatrixNdarray refactor part5: Add scalarization for LocalStoreStmt & GlobalStoreStmt with TensorType (#5946) (by Zhanlue Yang)
    • Deprecate SOA-layout for NdarrayMatrix/NdarrayVector (#6030) (by Zhanlue Yang)
    • Indexing for new local matrix implementation (#5783) (by Mike He)
    • Make scalar kernel arguments immutable (#5990) (by Lin Jiang)
    • Demote pow() with integer exponent (#6044) (by Yi Xu)
    • Support abs(i64) (#6018) (by Yi Xu)
    • MatrixNdarray refactor part4: Lowered TensorType to CHI IR level for elementwise-indexed MatrixNdarray (#5936) (by Zhanlue Yang)
    • MatrixNdarray refactor part3: Enable TensorType for MatrixNdarray at Frontend IR level (#5900) (by Zhanlue Yang)
    • Support linear system solving on GPU with cuSolver (#5860) (by pengyu)
    • MatrixNdarray refactor part2: Remove redundant members in python-scope AnyArray (#5885) (by Zhanlue Yang)
    • MatrixNdarray refactor part1: Refactor Taichi kernel argument to use TensorType (#5881) (by Zhanlue Yang)
    • MatrixNdarray refactor part0: Support direct TensorType construction in Ndarray and refactor use of element_shape (#5875) (by Zhanlue Yang)
    • Enable definition of local matrices/vectors (#5782) (by Mike He)
    • Build csr sparse matrix on GPU using coo format ndarray (#5838) (by pengyu)
    • Add @python_scope decorator for selected MatrixNdarray/VectorNdarray methods (#5844) (by Zhanlue Yang)
    • Make python scope comparison return 1 instead of -1 (#5840) (by daylily)
    • Allow implicit conversion of integer types in if conditions (#5763) (by daylily)
    • Support sparse matrix on GPU (#5185) (by pengyu)
    • Improve loop error message and remove the check for real type id (#5792) (by Zhao Liang)
    • Implement index validation for matrices/vectors (#5605) (by Mike He)
  • MeshTaichi
    • Fix nested mesh for (#6062) (by Chang Yu)
  • Vulkan backend
    • Track image layout internally (#5597) (by PENGUINLIONG)

Full changelog:

  • [bug] [gui] Fix a bug of drawing mesh instacing that cpu/cuda objects have an offset when copying to vulkan object (#6028) (by Mocki)
  • [bug] Fix cleaning cache failed (#6100) (by PGZXB)
  • [aot] Support multi-target builds for Apple M1 (#6083) (by PENGUINLIONG)
  • [spirv] [refactor] Rename debug_ segment to names_ (#6094) (by Ailing)
  • [dx12] Update codegen for range_for and mesh_for (#6092) (by Xiang Li)
  • [gui] Direct image presentation & faster direct copy routine (#6085) (by Bob Cao)
  • [vulkan] Support printing in debug mode on vulkan backend (#6075) (by Ailing)
  • [bug] Fix crashing when loading old offline cache files (#6089) (by PGZXB)
  • [ci] Update prebuild binary for llvm 15. (#6091) (by Xiang Li)
  • [example] Add RHI examples (#5969) (by Bob Cao)
  • [aot] Pragma once in taichi.cpp (#6088) (by PENGUINLIONG)
  • [Lang] Sort coo to build correct csr format sparse matrix on GPU (#6050) (by pengyu)
  • [build] Refactor test infrastructure for AOT tests (#6064) (by Zhanlue Yang)
  • [Lang] MatrixNdarray refactor part6: Add scalarization for LocalLoadStmt & GlobalLoadStmt with TensorType (#6024) (by Zhanlue Yang)
  • [Lang] MatrixField refactor 4/n: Disallow invalid matrix field definition (#6074) (by Yi Xu)
  • [bug] Remove unnecessary lower() in AotModuleBuilder::add (#6068) (by PGZXB)
  • [lang] Preserve shape info for Vectors (#6076) (by Mike He)
  • [misc] Simplify PR template (#6063) (by Ailing)
  • [Bug] Disable vkCmdWriteTimestamp with MacOS to enable tests on Vulkan (#6020) (by Zhanlue Yang)
  • [bug] Set cfg.offline_cache after reset() (#6073) (by PGZXB)
  • [ci] [dx12] Enable dx12 build for windows cpu ci. (#6069) (by Xiang Li)
  • [ci] Upgrade conda cudatoolkit version to 11.3 (#6070) (by Proton)
  • [Mesh] [bug] Fix nested mesh for (#6062) (by Chang Yu)
  • [Lang] Fixes matrix-vector multiplication (#6014) (by Mike He)
  • [ir] MatrixField refactor 3/n: Add MatrixFieldExpression (#6010) (by Yi Xu)
  • [dx12] Drop code for llvm passes which prepare for DXIL generation. (#5998) (by Xiang Li)
  • [aot] Guard C-API interfaces with try-catch (#6060) (by PENGUINLIONG)
  • [CI] Update torch and cuda version (#6054) (by pengyu)
  • [Lang] MatrixNdarray refactor part5: Add scalarization for LocalStoreStmt & GlobalStoreStmt with TensorType (#5946) (by Zhanlue Yang)
  • [Lang] Deprecate SOA-layout for NdarrayMatrix/NdarrayVector (#6030) (by Zhanlue Yang)
  • [aot] Dump required device capability in AOT module meta (#6056) (by PENGUINLIONG)
  • [Doc] Refactor field (#6006) (by Zhao Liang)
  • [Lang] Indexing for new local matrix implementation (#5783) (by Mike He)
  • [lang] Reformat source indicator in Python convention (#6053) (by PENGUINLIONG)
  • [misc] Enable offline cache in frontend instead of C++ Side (#6051) (by PGZXB)
  • [lang] Remove redundant codegen of integer pow (#6048) (by Yi Xu)
  • [Doc] Update docstring of pow() (#6046) (by Yi Xu)
  • [Lang] Make scalar kernel arguments immutable (#5990) (by Lin Jiang)
  • [build] Fix compile error on gcc (#6047) (by PGZXB)
  • [llvm] [refactor] Split LLVMCompiledData of kernels and tasks (#6019) (by Lin Jiang)
  • [Lang] Demote pow() with integer exponent (#6044) (by Yi Xu)
  • [doc] Refactor type system (#5984) (by Zhao Liang)
  • [test] Change deprecated make_camera() to Camera() (#6009) (by Zihua Wu)
  • [doc] Fix a typo in README.md (#6033) (by OccupyMars2025)
  • [misc] Lazy load spirv code from disk during offline cache (#6000) (by PGZXB)
  • [aot] Fixed compilation on Linux distros (#6043) (by PENGUINLIONG)
  • [bug] [test] Run C-API tests correctly on Windows (#6038) (by PGZXB)
  • [aot] C-API texture support and tests (#5994) (by PENGUINLIONG)
  • [Doc] Fix spelling of numerical and nightly in README.md (#6025) (by Lauchlin)
  • [doc] Fixed a format issue (#6023) (by Vissidarte-Herman)
  • [doc] Indenting (#6022) (by Vissidarte-Herman)
  • [Lang] Support abs(i64) (#6018) (by Yi Xu)
  • [lang] Merge ti_core.make_index_expr and ti_core.subscript (#5993) (by Zhanlue Yang)
  • [llvm] [refactor] Remove the use of vector with size=1 (#6002) (by Lin Jiang)
  • [bug] [test] Fix patch_os_environ_helper (#6017) (by Lin Jiang)
  • [ci] Remove legacy perf monitoring (to be reworked) (#6015) (by Proton)
  • Fix (#5999) (by PGZXB)
  • [doc] Format updates (#6016) (by Olinaaaloompa)
  • [refactor] Turn on torch_io tests for opengl, vulkan and dx11 backend (#5997) (by Ailing)
  • [ci] Adjust Windows GPU task buildbot tag (#6008) (by Proton)
  • Fixed compilation (#6005) (by PENGUINLIONG)
  • [autodiff] Avoid initializing Field with None (#6007) (by Yi Xu)
  • [doc] Cloth simulation tutorial (#6004) (by Olinaaaloompa)
  • [Lang] MatrixNdarray refactor part4: Lowered TensorType to CHI IR level for elementwise-indexed MatrixNdarray (#5936) (by Zhanlue Yang)
  • [llvm] [refactor] Link modules instead of cloning modules (#5962) (by Lin Jiang)
  • [dx12] Drop code for dxil generation. (#5958) (by Xiang Li)
  • [ci] Windows Build: Use PowerShell 7 (pwsh) (#5996) (by Proton)
  • Use CUDA primary context to work with PyTorch and Numba. (#5992) (by Haidong Lan)
  • [vulkan] Implement offline cache cleaning on vulkan (#5968) (by PGZXB)
  • [ir] MatrixField refactor 2/n: Rename GlobalVariableExpression to FieldExpression (#5989) (by Yi Xu)
  • [build] Add option to generate dependency graph of cmake targets (#5966) (by Ailing)
  • [autodiff] Fix matrix dual (#5985) (by Mingrui Zhang)
  • [doc] Add a note about the offline cache in the print_ir part (#5986) (by Zihua Wu)
  • [ir] MatrixField refactor 1/n: Make a GlobalVariableExpression solely represent a field (#5980) (by Yi Xu)
  • [aot] [opengl] Add GL interop so that we can export GL memory (#5956) (by Ailing)
  • [ci] Temporarily lower CUDA tests parallelism by 1 (4->3) (#5981) (by Proton)
  • [aot] Temporarily allow Vulkan extensions to be automatically enabled in C-API runtimes (#5976) (by PENGUINLIONG)
  • [cuda] Clear cuda context after init (#5891) (by Haidong Lan)
  • [ir] MatrixField refactor 0/n: Remove redundant code for SNode expressions (#5964) (by Yi Xu)
  • [test] [gui] Skip test of drawing mesh instances on Windows platform (#5971) (by Mocki)
  • [test] [gui] Add a test of fetching depth attachment (#5947) (by Mocki)
  • [test] [gui] Add a test of the display of wireframe mode (#5967) (by Mocki)
  • [aot] Support i8/u8 args in cgraph (#5961) (by Ailing)
  • RHI fixes & improvements (#5950) (by Bob Cao)
  • [test] [gui] Add a test of drawing part of mesh instances (#5965) (by Mocki)
  • [bug] Force CMAKE_OSX_ARCHITECTURES in sync with host processor's architecture to avoid ABI issues on Mac m1 (#5952) (by Zhanlue Yang)
  • [doc] Add explanatation of usage of TI_VISIBLE_DEVICE and CUDA_VISIBLE_DEVICES (#5910) (by Mocki)
  • [test] [gui] Add a test of drawing mesh instances (#5963) (by Mocki)
  • [test] [gui] Add a test of drawing part of lines (#5957) (by Mocki)
  • [doc] Refactor kernels and functions (#5943) (by Zhao Liang)
  • [dx12] Drop code for dx12 codegen. (#5953) (by Xiang Li)
  • [test] [gui] Add a test of drawing part of mesh (#5955) (by Mocki)
  • [test] [gui] Add a test of drawing part of particles (#5951) (by Mocki)
  • [test] [gui] Add a test of drawing lines (#5948) (by Mocki)
  • [llvm] [refactor] Refactor and add code about llvm modules (#5941) (by Lin Jiang)
  • [llvm] [refactor] Move the generation of struct for function to the context (#5937) (by Lin Jiang)
  • [Doc] Added Accelerate Python (#5940) (by Vissidarte-Herman)
  • [test] [gui] Add a test of fetching color attachment (#5920) (by Mocki)
  • [refactor] Refactor the implementation of cleaning offline cache files (#5934) (by PGZXB)
  • Revert "[vulkan] Less sync overhead for GGUI & Device API Examples (#5880)" (#5945) (by PENGUINLIONG)
  • [vulkan] Less sync overhead for GGUI & Device API Examples (#5880) (by Bob Cao)
  • [autodiff] Fix adjoint checkbit type in gdar checker (#5938) (by Mingrui Zhang)
  • [bug] Fix undefined symbol error by isolating CacheManager as a separate target (#5931) (by PGZXB)
  • [aot] Fix LLVM submit (#5930) (by PENGUINLIONG)
  • [refactor] [llvm] Unify llvm_type() and get_data_type() (#5927) (by Yi Xu)
  • [llvm] Add attributes to LLVMCompiledData (#5929) (by Lin Jiang)
  • [llvm] [refactor] Move the common parts of compilation to the base class (#5926) (by Lin Jiang)
  • Update index.md (#5928) (by Zhao Liang)
  • [doc] Refactor "Getting Started" (#5902) (by Zhao Liang)
  • [autodiff] Support clear gradients by type (#5911) (by Mingrui Zhang)
  • [Lang] MatrixNdarray refactor part3: Enable TensorType for MatrixNdarray at Frontend IR level (#5900) (by Zhanlue Yang)
  • [aot] Taichi C-API C++ wrapper (#5899) (by PENGUINLIONG)
  • [llvm] Enhance function is_same_type (#5922) (by Lin Jiang)
  • [Lang] Support linear system solving on GPU with cuSolver (#5860) (by pengyu)
  • Bugfix: Minor issue for CUPTI kernel profiler (#5879) (by Jack He)
  • [Error] [lang] Add error when breaking/continuing a static for inside non-static if (#5755) (by Lin Jiang)
  • [llvm] [refactor] Rename methods in KernelCodeGen (#5919) (by Lin Jiang)
  • [refactor] [ir] Remove legacy LocalAddress / VectorElement / create_vector_or_scalar_type() (#5918) (by Yi Xu)
  • [autodiff] Fix validate autodiff kernel name lost (#5912) (by Mingrui Zhang)
  • [misc] Disable parallel compilation (#5916) (by Lin Jiang)
  • [refactor] [ir] Remove legacy LaneAttribute (#5901) (by Yi Xu)
  • [aot] [test] Fix c api aot tests for vulkan and opengl backend (by Ailing Zhang)
  • [aot] Add C API for opengl backend (by Ailing Zhang)
  • [doc] Updated forward-mode autodiff (#5894) (by Vissidarte-Herman)
  • [Lang] MatrixNdarray refactor part2: Remove redundant members in python-scope AnyArray (#5885) (by Zhanlue Yang)
  • [refactor] [ir] Remove legacy LaneAttribute usage from ExternalPtrStmt/GlobalPtrStmt (#5898) (by Yi Xu)
  • [ci] Switch windows cpu build to llvm 15. (#5832) (by Xiang Li)
  • [Lang] MatrixNdarray refactor part1: Refactor Taichi kernel argument to use TensorType (#5881) (by Zhanlue Yang)
  • [Bug] [lang] Fix printing i8/u8 (#5893) (by Yi Xu)
  • [misc] Remove FrontendEvalStmt (#5897) (by PGZXB)
  • [refactor] [ir] Remove legacy ElementShuffleStmt (#5892) (by Yi Xu)
  • [bug] Remove mistakenly-added C-API compilation from release pipeline (#5890) (by Zhanlue Yang)
  • [llvm] Fix PtrOffset address for shared array in llvm 15. (#5867) (by Xiang Li)
  • [vulkan] [refactor] [bug] Redesign gfx::OfflineCacheManager to unify compilation of kernels on vulkan  (#5889) (by PGZXB)
  • [doc] Updated Quantized data types (#5886) (by Vissidarte-Herman)
  • [Lang] MatrixNdarray refactor part0: Support direct TensorType construction in Ndarray and refactor use of element_shape (#5875) (by Zhanlue Yang)
  • [refactor] [ir] Remove legacy stmt width (#5882) (by Yi Xu)
  • [bug] [ggui] Fix cpu vulkan interop build (#5865) (by Ailing)
  • [refactor] Separate texture args from scalar arg declaration (#5878) (by Ailing)
  • [Lang] Enable definition of local matrices/vectors (#5782) (by Mike He)
  • [gfx] Unify the implementation of offline cache for gfx backends (#5868) (by PGZXB)
  • [bug] Improve error message with GlobalPtrStmt indexing (#5841) (by Zhanlue Yang)
  • [aot] Workaround build structure to export GGUI symbols in libtaichi_export_core.so (#5870) (by Ailing)
  • [doc] Updated supported backend DX 11 (#5845) (by Vissidarte-Herman)
  • [bug] Enabled NdarrayType & MatrixType annotation parsing for ti.func (#5814) (by Zhanlue Yang)
  • Removed Unexpected debug code in repo (#5866) (by PENGUINLIONG)
  • [aot] C-API error handling mechanism (#5847) (by PENGUINLIONG)
  • [vulkan] Detect and set device-capabilities for aot::TargetDevice used in offline cache (#5843) (by PGZXB)
  • [Lang] Build csr sparse matrix on GPU using coo format ndarray (#5838) (by pengyu)
  • [gui] Add some built-int math APIs for building translation, scale and rotation matrix (#5827) (by Mocki)
  • [Lang] Add @python_scope decorator for selected MatrixNdarray/VectorNdarray methods (#5844) (by Zhanlue Yang)
  • [Lang] Make python scope comparison return 1 instead of -1 (#5840) (by daylily)
  • [vulkan] Support offline cache on Vulkan (#5825) (by PGZXB)
  • [Lang] Allow implicit conversion of integer types in if conditions (#5763) (by daylily)
  • making gravity option used (#5836) (by Michael Xu)
  • [autodiff] Add grad type for SNode (#5805) (by Mingrui Zhang)
  • [Doc] New FAQs added (#5784) (by Olinaaaloompa)
  • [dx12] Drop code for dx12. (#5816) (by Xiang Li)
  • [Doc] Update type cast (#5831) (by Zhao Liang)
  • [autodiff] Fix global data access rule checker memory allocation (#5801) (by Mingrui Zhang)
  • [misc] Bump version to v1.1.3 (#5823) (by Ailing)
  • [doc] Updated compilation warnings (#5808) (by Vissidarte-Herman)
  • Fixed crash in SPIR-V CodeGen when a const is declared twice (#5813) (by PENGUINLIONG)
  • [test] Refactor opengl and vulkan cpp aot tests (#5812) (by Ailing)
  • [Bug] [type] Fix wrong type cast in codegen of storing quant floats (#5818) (by Yi Xu)
  • [autodiff] Support shift ptr in dynamic index (#5770) (by Mingrui Zhang)
  • [ci] Regenerate AOT binaries for every Android smoke test (#5815) (by Proton)
  • [ci] Disable show_env job on fork repo (#5811) (by Bo Qiao)
  • [bug] Fix: GraphBuilder::Sequential unable to handle Matrix-type argument (#5806) (by Zhanlue Yang)
  • [build] [bug] Fix Taichi build with vulkan on and opengl off (#5807) (by Bo Qiao)
  • [test] Add a cgraph test with template args in ti.func (#5803) (by Ailing)
  • [Lang] Support sparse matrix on GPU (#5185) (by pengyu)
  • [test] Enable llvm aot tests for vulkan and opengl backend (#5795) (by Ailing)
  • Fixed a display issue (#5796) (by Vissidarte-Herman)
  • [Lang] Improve loop error message and remove the check for real type id (#5792) (by Zhao Liang)
  • [Doc] Update global_settings.md (#5764) (by Zhao Liang)
  • [ci] Workaround nightly C++ test crash (#5789) (by Proton)
  • [llvm] Disable f16 atomic hack for llvm 15. (#5756) (by Xiang Li)
  • [aot] Workaround C-API build structure to include GGUI symbols (#5787) (by Zhanlue Yang)
  • [ci] Skip C++ tests on macOS + CPU (#5778) (by Proton)
  • [Doc] Update init docstring (#5759) (by Zhao Liang)
  • [doc] Fix struct to numpy example typo (#5781) (by Garry Ling)
  • [test] Get rid of utils.py in aot python scripts (#5785) (by Ailing)
  • [refactor] Refactor C++ aot test to accommodate multiple backends (by Ailing Zhang)
  • [test] Expand python aot test coverage for opengl backend (by Ailing Zhang)
  • [opengl] Fix target device for opengl aot (by Ailing Zhang)
  • Fixed GGUI scene mesh memory leak (#5779) (by PENGUINLIONG)
  • [autodiff] [test] Recover the forward mode test cases (#5696) (by Mingrui Zhang)
  • [ci] Test the offline cache every day (#5768) (by PGZXB)
  • [lang] Update scan impl with shared memory usage (#5762) (by Bo Qiao)
  • Revert "[aot] Added basic infrastructure for gui_utils interfaces - unimplemented (#5688)" (#5760) (by Zhanlue Yang)
  • Fixed window size crash (#5765) (by PENGUINLIONG)
  • [llvm] Support SharedArray global when lower PtrOffsetStmt. (#5758) (by Xiang Li)
  • [ci] Prevent using the offline cache of previous jobs (#5734) (by Lin Jiang)
  • [gui] Add GGUI set_image support for non-Vector fields and numpy ndarrays. (#5654) (by Carbene)
  • [Lang] Implement index validation for matrices/vectors (#5605) (by Mike He)
  • [aot] Added basic infrastructure for gui_utils interfaces - unimplemented (#5688) (by Zhanlue Yang)
  • [Error] Do not show warning when the offline cache path does not exist (#5747) (by Lin Jiang)
  • [vulkan] Fixed query pool invalid usage (#5717) (by PENGUINLIONG)
  • [llvm] Fix crash caused on ByVal Attribute when switch to llvm 15. (#5745) (by Xiang Li)
  • [bug] Fix incorrect autodiff_mode information in offline cache key (#5737) (by Mingming Zhang)
  • [lang] Add parallel scan prefix sum utility (#5697) (by Bo Qiao)
  • [AOT] Added texture interfaces to C-API (#5520) (by PENGUINLIONG)
  • [autodiff] Move the global data access rule checker to experimental (#5719) (by Mingrui Zhang)
  • [ci] Rename lite test CI label, force full test on rc branch (#5732) (by Proton)
  • [ci] Fix TI_SKIP_CPP_TESTS (#5720) (by Proton)
  • [misc] Bump version to v1.1.1 (#5726) (by Taichi Gardener)
  • Improve Windows build script (#5611) (by PENGUINLIONG)
  • Remove miscommited file (#5727) (by Bo Qiao)
  • [test] Fix autodiff test for unsupported shift ptr (#5723) (by Mingrui Zhang)
  • [Doc] [type] Add introduction to quantized types (#5705) (by Yi Xu)
  • Fix shared array for all Vulkan versions. (#5722) (by Haidong Lan)
  • [autodiff] Clear all dual fields when exiting context manager (#5716) (by Mingrui Zhang)
  • [bug] Support indexing via np.integer for field (#5712) (by Ailing)
  • [vulkan] Relax number of array args for each kernel (#5689) (by Ailing)
  • [bug] Properly delete functions of a SNode Tree (#5710) (by Lin Jiang)
  • [Doc] Add docs for GGUI's new features (#5647) (by Mocki)
  • [gui] Support set_image with texture (#5655) (by PENGUINLIONG)
  • [Doc] Add introduction to forward mode autodiff (#5680) (by Mingrui Zhang)
  • [autodiff] Fix AdStackAllocaStmt not correctly backup (#5692) (by Mingrui Zhang)
  • [doc] Rename ti.struct_class to ti.dataclass (#5706) (by Yi Xu)
  • [gui] GGUI renames (#5704) (by PENGUINLIONG)
  • [Vulkan] Track image layout internally (#5597) (by PENGUINLIONG)
  • [ci] Confine show_environ task to Linux bots (#5677) (by Proton)
  • [build] [refactor] Decouple GUI source files from taichi_core target (#5676) (by Bo Qiao)
  • [ci] Rename libcommon.sh -> common-utils.sh, remove expore core build task (#5673) (by Proton)
  • [ci] Temporarily disable a M1 vulkan test (#5701) (by Proton)
  • [doc] Add comments to explain the commit Id for Build Andriod Demos CI pipeline (#5700) (by Zhanlue Yang)
  • [bug] Fix ndarray arg with shape=(1,) in cgraph (#5666) (by Ailing)
  • [doc] Fix typo (#5695) (by Proton)
  • [ci] Temporarily disable M1 vulkan tests (bot3 is ill) (#5698) (by Proton)
  • [build] Add commit id to taichi-aot-demo for Build-Andriod-Demos CI pipeline (#5693) (by Zhanlue Yang)
  • [bug] Fix potential bug of loading of offline cache (#5682) (by Mingming Zhang)
  • [aot] Add Comet Demo to AOT test cases (#5671) (by Zhanlue Yang)
  • [Doc] Add doc about offline cache (#5646) (by Mingming Zhang)
  • [lang] Make vector a real class (#5653) (by PENGUINLIONG)
  • [spirv] [bug] Fix invalid CFG error when simulated atomic is present (#5678) (by Ailing)
  • [ci] Fix python/examples crashes in non-lite test mode (#5670) (by Proton)
  • [Bug] [opt] Remove wrong optimization: Float x // 1 -> x (#5672) (by Yi Xu)
  • [bug] [ir] Change the way to convert for-loop with break to while-loop (#5674) (by Lin Jiang)
  • [aot] Add SPH to AOT test cases (#5642) (by Zhanlue Yang)
  • [ci] Only run portion of tests on PR draft (#5626) (by Proton)
  • [autodiff] Use external array fill to initialize and clear seed (#5656) (by Mingrui Zhang)
  • [bug] Fix bug that kernel names are not correctly captured by the profiler (#5651) (by Mingming Zhang)
  • [lang] Give warning about printing in Vulkan (#5661) (by PENGUINLIONG)
  • Support exporting vertex velocity (#5644) (by YuZhang)
  • [aot] Add helper function to construct Ndarray in C-API (#5641) (by Zhanlue Yang)
  • [gui] GGUI scene APIs are broken (#5658) (by PENGUINLIONG)
  • [Doc] Typo in the doc. (#5652) (by dongqi shen)
  • [autodiff] Add the global data access rule checker (by mingrui)
  • [autodiff] Add gradient visited for global data access rule checker (by mingrui)
  • [autodiff] Print more specific error message that autodiff does not support to_numpy (#5630) (by PhrygianGates)
  • [ci] Drop py36 in nightly and release (#5640) (by Ailing)
  • [misc] Explicitly specify base tag commit when running make_changelog.py (#5632) (by Ailing)
  • [aot] Rewrite mpm88 aot test with C-API (#5615) (by Zhanlue Yang)
  • [Build] Clean up Taichi core cmake (#5595) (by Bo Qiao)
taichi - v1.1.2

Published by github-actions[bot] about 2 years ago

This is a bug fix release for v1.1.0.
Full changelog:

taichi - v1.1.0

Published by github-actions[bot] about 2 years ago

Highlights

New features

Quantized data types

High-resolution simulations can deliver great visual quality, but are often limited by the capacity of the onboard GPU memory. This release adds quantized data types, allowing you to define your own integers, fixed-point numbers, or floating-point numbers of arbitrary number of bits that may strike a balance between your hardware limits and simulation effects. See Using quantized data types for a comprehensive introduction.

Offline cache

A Taichi kernel is implicitly compiled the first time it is called. The compilation results are kept in an online in-memory cache to reduce the overhead in the subsequent function calls. As long as the kernel function is unchanged, it can be directly loaded and launched. The cache, however, is no longer available when the program terminates. Then, if you run the program again, Taichi has to re-compile all kernel functions and reconstruct the online in-memory cache. And the first launch of a Taichi function is always slow due to the compilation overhead.
To address this problem, this release adds the offline cache feature, which dumps the compilation cache to the disk for future runs. The first launch overhead can be drastically reduced in subsequent runs. Taichi now constructs and maintains an offline cache by default.
The following table shows the launch overhead of running cornell_box on the CUDA backend with and without offline cache:

Time spent on compilation and cached data loading
Offline cache disabled 24.856s
Offline cache enabled (1st run) 25.435s
Offline cache enabled (2nd run) 0.677s

Note that, for now, the offline cache feature works only on the CPU and CUDA backends. If your code behaves abnormally, disable offline cache by setting the environment variable TI_OFFLINE_CACHE=0 or ti.init(offline_cache=False) and file an issue with us on Taichi's GitHub repo. See Offline cache for more information.

Forward-mode automatic differentiation

Adds forward-mode automatic differentiation via ti.ad.FwdMode. Unlike the existing reverse-mode automatic differentiation, which computes vector-Jacobian product (vJp), forward-mode computes Jacobian-vector product (Jvp) when evaluating derivatives. Therefore, forward-mode automatic differentiation is much more efficient in situations where the number of a function's outputs is greater than its inputs. Read this example, which demonstrates Jacobian matrix computation in forward mode and reverse mode.

SharedArray (experimental)

GPU's shared memory is a fast small memory that is visible within each thread block (or workgroup in Vulkan). It is widely used in scenarios where performance is a crucial concern. To give you access to your GPU's shared memory, this release adds the SharedArray API under the namespace ti.simt.block.
The following diagram illustrates the performance benefits of Taichi's SharedArray. With SharedArray, Taichi Lang is comparable to or even outperforms the equivalent CUDA code.

n-body benchmarking

Texture (experimental)

Taichi now supports texture bilinear sampling and raw texel fetch on both Vulkan and OpenGL backends. This feature leverages the hardware texture unit and diminishes the need for manual composition of bilinear interpolation code in image processing tasks. This feature also provides an easy way for texture mapping in tasks such as rasterization or ray-tracing. On Vulkan backend, Taichi additionally supports image load and store. You can directly manipulate texels of an image and use this very image in subsequent texture mapping.

Note that the current texture and image APIs are in the early stages and subject to change. In the future we plan to support bindless textures to extend to tasks such as ray-tracing. We also plan to extend full texture support to all backends that support texture APIs.

Run ti example simple_texture to see an example of texture support!

Improvements

GGUI

  • Supports fetching and storing the depth information of the current scene:
    • In a Taichi field: ti.ui.Window.get_depth_buffer(field);
    • In a NumPy array: ti.ui.Window.get_depth_buffer_as_numpy().
  • Supports drawing 3D lines using Scene.lines(vertices, width).
  • Supports drawing mesh instances. You can pass a list of transformation matrices (ti.Matrix.field(4, 4, ti.f32, shape=N)) and call ti.ui.Scene.mesh_instance(vertices, transforms=TransformMatrixField) to put various mesh instances at different places.
  • Supports showing the wireframe of a mesh when calling Scene.mesh() or Scene.mesh_instance() by setting show_wireframe=True.

Syntax

  • Taichi dataclass: Taichi now recommends using the @ti.dataclass decorator to define struct types, or even attach functions to them. See Taichi dataclasses for more information.

    @ti.dataclass
    class Sphere:
      center: vec3
      radius: ti.f32
      @ti.func
      def area(self):
        # a function to run in taichi scope
        return 4 * math.pi * self.radius * self.radius
      def is_zero_sized(self):
        # a python scope function
        return self.radius == 0.0
    
  • As shown in the dataclass example above, vec2, vec3, and vec4 in the taichi.math module (same for ivec and uvec) can be directly used as type hints. The numeric precision of these types is determined by default_ip or default_fp in ti.init().

  • More flexible instantiation for a struct or dataclass:
    In earlier releases, to instantiate a taichi.types.struct and taichi.dataclass, you have to explicitly put down a complete list of member-value pairs like:

    ray = Ray(ro=vec3(0), rd=vec3(1, 0, 0), t=1.0)
    

    As of this release, you are given more options. The positional arguments are passed to the struct members in the order they are defined; the keyword arguments set the corresponding struct members. Unspecified struct members are automatically set to zero. For example:

    # use positional arguments to set struct members in order
    ray = Ray(vec3(0), vec3(1, 0, 0), 1.0)
    
    # ro is set to vec3(0) and t will be set to 0
    ray = Ray(vec3(0), rd=vec3(1, 0, 0))
    
    # both ro and rd are set to vec3(0)
    ray = Ray(t=1.0)
    
    # ro is set to vec3(1), rd=vec3(0) and t=0.0
    ray = Ray(1)
    
    # all members are set to 0.
    ray = Ray()
    
  • Supports calling fill() from both the Python scope and the Taichi scope.
    In earlier releases, you can only call fill() from the Python scope, which is a method in the ScalarField or MatrixField class. As of this release, you can call this method from either the Python scope or the Taichi scope. See the following code snippet:

    x = ti.field(int, shape=(10, 10))
    x.fill(1)
    
    @ti.kernel
    def test():
        x.fill(-1)
    
  • More flexible initialization for customized matrix types:
    As the following code snippet shows, matrix types created using taichi.types.matrix() or taichi.types.vector() can be initialized more flexibly: Taichi automatically combines the inputs and converts them to a matrix whose shape matches the shape of the target matrix type.

    # mat2 and vec3 are predefined types in the ti.math module
    mat2 = ti.types.matrix(2, 2, float)
    vec3 = ti.types.vector(3, float)
    
    m = mat2(1)  # [[1., 1.], [1., 1.]]
    m = mat2(1, 2, 3, 4)  # [[1., 2.], [3, 4.]]
    m = mat2([1, 2], [3, 4])  # [[1., 2.], [3, 4.]]
    m = mat2([1, 2, 3, 4])  # [[1., 2.], [3, 4.]]
    v = vec3(1, 2, 3)
    m = mat2(v, 4)  # [[1., 2.], [3, 4.]]
    
  • Makes ti.f32(x) syntax sugar for ti.cast(x, ti.f32), if x is neither a literal nor of a compound data type. Same for other primitive types such as ti.i32, ti.u8, or ti.f64.

  • More convenient axes order adjustment: A common way to improve the performance of a Taichi program is to adjust the order of axes when laying out field data in the memory. In earlier releases, this requires in-depth knowledge about the data definition language (the SNode system) and may become an extra burden in situations where sparse data structures are not required. As of this release, Taichi supports specifying the order of axes when defining a Taichi field.

    # Before
    x = ti.field(ti.i32)
    y = ti.field(ti.i32)
    ti.root.dense(ti.i, M).dense(ti.j, N).place(x)  # row-major
    ti.root.dense(ti.j, N).dense(ti.i, M).place(y)  # column-major
    # New syntax
    x = ti.field(ti.i32, shape=(M, N), order='ij')
    y = ti.field(ti.i32, shape=(M, N), order='ji')
    # SoA vs. AoS example
    p = ti.Vector.field(3, ti.i32, shape=(M, N), order='ji', layout=ti.Layout.SOA)
    q = ti.Vector.field(3, ti.i32, shape=(M, N), order='ji', layout=ti.Layout.AOS)
    

Important bug fixes

  • Fixed infinite loop when an integer pow() has a negative exponent (#5275)
  • Fixed numerical issues with matrix slicing (#4677)
  • Improved data type checks for ti.ndrange (#4478)

API changes

Added

  • ti.BitpackedFields
  • ti.from_paddle
  • ti.to_paddle
  • ti.FieldsBuilder.lazy_dual
  • ti.math module
  • ti.Texture
  • ti.ref
  • ti.dataclass
  • ti.simt.block.SharedArray

Moved

Old API New API
ti.clear_all_gradients ti.ad.clear_all_gradients
ti.Tape ti.ad.Tape
ti.FieldsBuilder.bit_array ti.FieldsBuilder.quant_array
ti.ui.Window.write_image ti.ui.Window.save_image
ti.ui.Window.GUI ti.ui.Window.get_gui

Deprecated

  • ti.ui.make_camera: Please construct cameras with ti.ui.Camera instead.

Deprecation notice

Python 3.6

As announced in v1.0.0 release, we no longer provide official python3.6 wheels through pypi. Users who need taichi with python3.6 may still build from source but its support is not guaranteed.

Taichi_GLSL

The taichi_glsl package on pypi will no longer be maintained as of this release. GLSL-related features will be implemented in the official taichi.math module, which includes data types and handy functions for daily math and shader development:

  • Vector types: vec2, vec3, and vec4.
  • Matrix types: mat2,mat3, and mat4.
  • GLSL functions such as step(),clamp(), and smoothstep().

MacOS 10.14

Official support for MacOS Mojave (10.14, released in 2018) will be dropped starting from v1.2.0. Please upgrade your MacOS if possible or let us know if you have any concerns.

Full changelog:

  • [misc] Update version to v1.1.0 (by Ailing Zhang)
  • [test] Fix autodiff test for unsupported shift ptr (#5723) (by Mingrui Zhang)
  • [Doc] [type] Add introduction to quantized types (#5705) (by Yi Xu)
  • [autodiff] Clear all dual fields when exiting context manager (#5716) (by Mingrui Zhang)
  • [bug] Support indexing via np.integer for field (#5712) (by Ailing)
  • [Doc] Add docs for GGUI's new features (#5647) (by Mocki)
  • [Doc] Add introduction to forward mode autodiff (#5680) (by Mingrui Zhang)
  • [autodiff] Fix AdStackAllocaStmt not correctly backup (#5692) (by Mingrui Zhang)
  • Fix shared array for all Vulkan versions. (#5721) (by Haidong Lan)
  • [misc] Rc v1.1.0 patch3 (#5709) (by Ailing)
  • [bug] RC v1.1.0 patch2 (#5683) (by Ailing)
  • [ci] Temporarily disable a M1 vulkan test (#5703) (by Proton)
  • [Doc] Add doc about offline cache (#5646) (#5686) (by Mingming Zhang)
  • [bug] Fix bug that kernel names are not correctly captured by the profiler (#5651) (#5669) (by Mingming Zhang)
  • [gui] GGUI scene APIs are broken (#5658) (#5667) (by PENGUINLIONG)
  • [release] v1.1.0 patch1 (#5649) (by Ailing)
  • [llvm] Compile serially when num_thread=0 (#5631) (by Lin Jiang)
  • [cuda] Reduce kernel profiler memory usage (#5623) (by Bo Qiao)
  • [doc] Add docstrings for texture related apis (by Ailing Zhang)
  • [Lang] Support from/to_image for textures and add tests (by Ailing Zhang)
  • [gui] Add wareframe mode for mesh & mesh_instance, add slider_int for Window.GUI. (#5576) (by Mocki)
  • avoid redundant compilation (#5607) (by yixu)
  • [misc] Enable offline cache by default (#5613) (by Mingming Zhang)
  • [Lang] Add parameter 'order' to specify layout for scalar, vector, matrix fields (#5617) (by Yi Xu)
  • [autodiff] [example] Add an example for computing Jacobian matrix (#5609) (by Mingrui Zhang)
  • [ci] Add PR tag for dx12. (#5614) (by Xiang Li)
  • fix ti.ui.Space (#5606) (by yixu)
  • [ci] Build Android export core (#5409) (by Proton)
  • [type] Rename module quantized_types to quant (#5608) (by Yi Xu)
  • [llvm] [aot] Add unit tests for Dynamic SNodes with LLVM AOT (#5594) (by Zhanlue Yang)
  • [build] Forcing write file encoding in misc/make_changelog.py (#5604) (by Proton)
  • [llvm] [aot] Add unit tests for Bitmasked SNodes with LLVM AOT (#5593) (by Zhanlue Yang)
  • [GUI] Shifted to a more commonly supported type for set_image (#5514) (by PENGUINLIONG)
  • [gui] Fix snode offset (mesh disappearing bug) (#5579) (by Bob Cao)
  • [refactor] Redesign loading, dumping and cleaning of offline cache (#5578) (by Mingming Zhang)
  • [autodiff] [test] Add more complex for loop test cases for forward mode (#5592) (by Mingrui Zhang)
  • fix num_triangles (#5602) (by yixu)
  • [cuda] Decouple update from sync in kernel profiler (#5589) (by Bo Qiao)
  • Removed unnecessary tags to work around a crowdIn issue. (#5590) (by Vissidarte-Herman)
  • [Lang] Change vec2/3/4 from function calls to types (#5556) (by Zhao Liang)
  • [vulkan] Enable shared array support for vulkan backend (#5583) (by Haidong Lan)
  • [aot] Avoid reserved words when generate C# AOT bindings (#5586) (by Proton)
  • [ci] Update llvm15 prebuild binary. (#5581) (by Xiang Li)
  • [doc] Removed a redundant line break to see if it will fix a CrowdIn issue (#5584) (by Vissidarte-Herman)
  • [type] Refine SNode with quant 10/n: Add validity checks and simplify BitStructType (#5573) (by Yi Xu)
  • [autodiff] [refactor] Rename the parameters to param for forward mode (#5582) (by Mingrui Zhang)
  • [doc] Format fix to work around a crowdIn issue (#5580) (by Vissidarte-Herman)
  • Update syntax.md (#5575) (by Zhao Liang)
  • [doc] Added an mdx-code-block escape hatch syntaxt to workaround a CrowdIn … (#5574) (by Vissidarte-Herman)
  • [Doc] Update external.md (#5547) (by Zhao Liang)
  • [doc] Add introductions to ambient_elements in llvm_sparse_runtime.md (#5567) (by Zhanlue Yang)
  • [refactor] Unify ways to set external array args (#5565) (by Ailing)
  • [Lang] [type] Refine SNode with quant 9/n: Rename some parameters in quant APIs (#5566) (by Yi Xu)
  • [opt] Improved warning messages for statements (#5564) (by Zhanlue Yang)
  • [bug] Fix android build for taichi-aot-demo (#5560) (by Ailing)
  • [opt] Added llvm::SeparateConstOffsetFromGEPPass() for shared_memory optimizations (#5494) (by Zhanlue Yang)
  • [Lang] [type] Refine SNode with quant 8/n: Replace bit_struct with ti.BitpackedFields (#5532) (by Yi Xu)
  • [build] Enforce local-scoped symbols in static llvm libs (#5553) (by Bo Qiao)
  • [refactor] Unify ways to set ndarray args (#5559) (by Ailing)
  • [gui] [vulkan] Support for drawing mesh instances (#5546) (by Mocki)
  • [llvm] [aot] Added taichi_sparse unit test to C-API for CUDA backend (#5531) (by Zhanlue Yang)
  • Add glFinish to wait_idle (#5538) (by Bo Qiao)
  • [autodiff] Skip ConstStmt when generating alloca for dual (#5554) (by Mingrui Zhang)
  • [ci] Fix macOS nightly build (#5552) (by Proton)
  • Fix potential bug of lang::Program that could be double finalized (#5550) (by Mingming Zhang)
  • [Error] Raise error when using the struct for in python scope (#5536) (by Lin Jiang)
  • [bug] Fix calling make_aot_kernel failed when offline_cache=True (#5537) (by Mingming Zhang)
  • [ci] Move macOS 10.15 workloads to self-hosted runners (#5539) (by Proton)
  • [build] [refactor] Utilize find_cuda_toolkit and clean some target dependencies (#5526) (by Bo Qiao)
  • [autodiff] [test] Add more for-loop tests for forward mode (#5525) (by Mingrui Zhang)
  • [Lang] [bug] Ensure non-i32 compatibility in while statement conditions (#5521) (by daylily)
  • [Lang] Improve error message for ggui on opengl backend (#5509) (by Zhao Liang)
  • [aot] Support texture and rwtexture in cgraph (#5528) (by Ailing)
  • [llvm] Add parallel compilation to CUDA backend (#5519) (by Lin Jiang)
  • [type] [refactor] Decouple quant from SNode 9/n: Remove exponent handling from SNode (#5510) (by Yi Xu)
  • [Lang] Fix numpy and taichi operations problem (#5506) (by Zhao Liang)
  • [Vulkan] Added an interface to get accumulated on-device execution time (#5488) (by PENGUINLIONG)
  • [Async] [refactor] Remove AsyncTaichi (#5523) (by Lin Jiang)
  • [misc] Fix warning at GGUI canvas.circles (#5424) (#5518) (by Proton)
  • [gui] Support rendering lines from a part of VBO (#5495) (by Mocki)
  • [ir] Cast indices of ExternalPtrStmt to ti.i32 (#5516) (by Yi Xu)
  • [Lang] Support syntax sugar for ti.cast (#5515) (by Yi Xu)
  • [Lang] Better struct initialization (#5481) (by Zhao Liang)
  • [example] Make implicit_fem fallback to CPU when CUDA is not available (#5512) (by Yi Xu)
  • [Lang] Make MatrixType support more ways of initialization (#5479) (by Zhao Liang)
  • [Vulkan] Fixed depth texture validation error (#5507) (by PENGUINLIONG)
  • [bug] Fix vulkan source when build for android (#5508) (by Bo Qiao)
  • [refactor] [llvm] Rename CodeGenCPU/CUDA/WASM and CodeGenLLVMCPU/CUDA/WASM (#5500) (by Lin Jiang)
  • [bug] Let the arguments in ti.init override the environment variables (#5497) (by Lin Jiang)
  • [misc] Add debug logging and TI_AUTO_PROF for offline cache (#5503) (by Mingming Zhang)
  • [misc] ti.Tape -> ti.ad.Tape (#5501) (by Zihua Wu)
  • [misc] Support jit offline cache for kernels that call real functions (#5477) (by Mingming Zhang)
  • [doc] Update cpp tests build doc (#5493) (by Bo Qiao)
  • [Lang] Support call field.fill in kernel functions (#5486) (by Zhao Liang)
  • [Lang] [bug] Make comparisons always return i32 (#5487) (by Yi Xu)
  • [gui] [vulkan] Support 3d-lines rendering (#5492) (by Mocki)
  • [autodiff] Switch off parts of store forwarding optimization for autodiff (#5464) (by Mingrui Zhang)
  • [llvm] [aot] Add LLVM to CAPI part 9: Added AOT field tests for LLVM backend in C-API (#5461) (by Zhanlue Yang)
  • [bug] [llvm] Fix GEP when allocating TLS buffer in struct for (#5473) (by Lin Jiang)
  • [gui] [vulkan] Modify some internal APIs (#5484) (by Mocki)
  • [Build] Remove TI_EMSCRIPTENED related code (#5483) (by Bo Qiao)
  • [type] [refactor] Decouple quant from SNode 8/n: Remove redundant handling of llvm15 in codegen_llvm_quant (#5480) (by Yi Xu)
  • [CUDA] Enable shared memory for CUDA (#5429) (by Haidong Lan)
  • [gui] [vulkan] A faster version of depth copy through ti.field/ti.ndarray (copy directly from vulkan to cuda/gpu/cpu) (#5455) (by Mocki)
  • [misc] Add missing members of XXXExpression and FrontendXXXStmt to result of ASTSerializer (#5471) (by Mingming Zhang)
  • [llvm] [aot] Added field tests for LLVM backend in CGraph (#5458) (by Zhanlue Yang)
  • [type] [refactor] Decouple quant from SNode 7/n: Rewrite BitStructStoreStmt codegen without SNode (#5475) (by Yi Xu)
  • [llvm] [aot] Add LLVM to CAPI part 8: Added CGraph tests for LLVM backend in C-API (#5456) (by Zhanlue Yang)
  • [build] [refactor] Rename taichi core and taichi python targets (#5451) (by Bo Qiao)
  • [llvm] [aot] Add LLVM to CAPI part 6: Handle Field initialization in C-API (#5444) (by Zhanlue Yang)
  • [llvm] [aot] Add LLVM to CAPI part 7: Added AOT kernel tests for LLVM backend in C-API (#5447) (by Zhanlue Yang)
  • [error] Throw proper error message when an Ndarray is passed in via ti.template (#5457) (by Ailing)
  • [type] [refactor] Decouple quant from SNode 6/n: Rewrite extract_quant_float() without SNode (#5448) (by Yi Xu)
  • [bug] Set SNode tree id to all SNodes (#5454) (by Lin Jiang)
  • [AOT] Support on-device event (#5433) (by PENGUINLIONG)
  • [llvm] [aot] Add LLVM to CAPI part 5: Added C-API tests for Vulkan and Cuda backend (#5440) (by Zhanlue Yang)
  • [llvm] [bug] Fixing the crash in release tests introduced by a typo in #5381 where we need a deep copy of arglist. (#5441) (by Proton)
  • [llvm] [aot] Add LLVM to CAPI part 4: Enabled C-API tests on CI & Added C-API tests for CPU backend (#5435) (by Zhanlue Yang)
  • [misc] Bump version to v1.0.5 (#5437) (by Proton)
  • [aot] Support specifying vk_api_version in CompileConfig (#5419) (by Ailing)
  • [Lang] Add append attribute to dynamic fields (#5413) (by Zhao Liang)
  • [Lang] Add inf and nan (#5270) (by Zhao Liang)
  • [Doc] Updated docsite structure (#5416) (by Vissidarte-Herman)
  • [ci] Run release tests (#5327) (by Proton)
  • [type] [refactor] Decouple quant from SNode 5/n: Rewrite load_quant_float() without SNode (#5422) (by Yi Xu)
  • [llvm] Allow using clang 15 for COMPILE_LLVM_RUNTIME (#5381) (by Xiang Li)
  • [opengl] Speedup compilation for Nvidia cards (#5430) (by Bob Cao)
  • [Bug] Fix infinite loop when exponent of integer pow is negative (#5275) (by Mike He)
  • [build] [refactor] Move spirv codegen and common targets (#5415) (by Bo Qiao)
  • [autodiff] Check not placed field.dual and add needs_dual (#5412) (by Mingrui Zhang)
  • [bug] Simplify scalar handling in cgraph and relax field_dim check (#5411) (by Ailing)
  • [gui] [vulkan] Surpport for getting depth information for python users. (#5410) (by Mocki)
  • [AOT] Adjusted C-API for nd-array type conformance (#5417) (by PENGUINLIONG)
  • [type] Decouple quant from SNode 4/n: Add exponent info to BitStructType (#5407) (by Yi Xu)
  • [llvm] Avoid creating new LLVM contexts when updating struct module (#5397) (by Lin Jiang)
  • [build] Enable C-API compilation on CI (#5403) (by Zhanlue Yang)
  • [Lang] Implement assignment by slicing (#5369) (by Mike He)
  • [llvm] [aot] Add LLVM to CAPI part 3: Adapted AOT interfaces for LLVM backend (#5402) (by Zhanlue Yang)
  • [AOT] Fixed Vulkan device import capability settings (#5400) (by PENGUINLIONG)
  • [llvm] [aot] Add LLVM to CAPI part 2: Adapted memory allocation interfaces for LLVM backend (#5396) (by Zhanlue Yang)
  • [autodiff] Add ternary operators for forward mode (#5405) (by Mingrui Zhang)
  • [llvm] [aot] Add LLVM to CAPI part 1: Implemented capi::LlvmRuntime class (#5393) (by Zhanlue Yang)
  • [ci] Add per test hard timeout limit (#5384) (by Proton)
  • [ci] Properly detect $DISPLAY (#5398) (by Proton)
  • [ci] Llvm15 clang10 ci (#5368) (by Xiang Li)
  • [llvm] [bug] Add stop grad to ASTSerializer (#5401) (by Lin Jiang)
  • [autodiff] Add test for ternary operators in reverse mode autodiff (#5395) (by Mingrui Zhang)
  • [llvm] [aot] Add numerical unit tests for LLVM-CGraph (#5319) (by Zhanlue Yang)
taichi - v1.0.4

Published by github-actions[bot] over 2 years ago

Highlights:

  • Documentation
    • Fix typos (#5283) (by Kian-Meng Ang)
    • Update dev_install.md (#5266) (by Vissidarte-Herman)
    • Updated README command lines (#5199) (by Vissidarte-Herman)
    • Modify compilation warnings (#5180) (by Olinaaaloompa)
    • Updated odop.md, removing obsolete information (#5163) (by Vissidarte-Herman)
  • Language and syntax
    • Refine SNode with quant 7/n: Support placing QuantFixedType under quant_array (#5386) (by Yi Xu)
    • Add determinant for 1d case (#5375) (by Zhao Liang)
    • Make floor, ceil and round accept a dtype optional argument (#5307) (by Zhao Liang)
    • Rename struct_class to dataclass (#5365) (by Zhao Liang)
    • Improve ti example so that users can choose which example to run by entering numbers. (#5265) (by Zhao Liang)
    • Refine SNode with quant 5/n: Rename bit_array to quant_array (#5344) (by Yi Xu)
    • Make bit_vectorize a parameter of ti.loop_config (#5334) (by Yi Xu)
    • Refine SNode with quant 3/n: Turn bit_vectorize into an on/off switch (#5331) (by Yi Xu)
    • Add errror message for missing init call (#5280) (by Zhao Liang)
    • Fix fractal gui close warning (#5281) (by Zhao Liang)
    • Refine SNode with quant 2/n: Enable struct for on bit_array with bit_vectorize off (#5253) (by Yi Xu)
    • Refactor indexing expressions in AST & enforce integer indices (#5138) (by daylily)

Full changelog:

  • Revert "[llvm] (Decomp of #5251 11/n) Enable parallel compilation on CPU backend (#5394)" (by Proton)
  • [refactor] Default dtype of ndarray type should be None instead of f32 (#5391) (by Ailing)
  • [llvm] (Decomp of #5251 11/n) Enable parallel compilation on CPU backend (#5394) (by Lin Jiang)
  • [gui] [vulkan] Surpport for python users to control the start index and count number of particles & meshes data. (#5388) (by Mocki)
  • [autodiff] Support binary operators for forward mode (#5389) (by Mingrui Zhang)
  • [llvm] (Decomp of #5251 10/n) Make SNode tree compatible with parallel compilation (#5390) (by Lin Jiang)
  • [llvm] [refactor] (Decomp of #5251 9/n) Refactor CodeGen to support parallel compilation on LLVM backend (#5387) (by Lin Jiang)
  • [Lang] [type] Refine SNode with quant 7/n: Support placing QuantFixedType under quant_array (#5386) (by Yi Xu)
  • [llvm] [refactor] (Decomp of #5251 8/n) Refactor KernelCacheData (#5383) (by Lin Jiang)
  • [cuda] [type] Refine SNode with quant 6/n: Support __ldg for loading QuantFixedType and QuantFloatType (#5374) (by Yi Xu)
  • [doc] Add simt functions in operators (#5333) (by Bo Qiao)
  • [Lang] Add determinant for 1d case (#5375) (by Zhao Liang)
  • [lang] Texture image load store support (#5317) (by Bob Cao)
  • [bug] Cast scalar to right type before converting to uint64 (by Ailing Zhang)
  • [refactor] Check dtype mismatch in cgraph compilation and runtime (by Ailing Zhang)
  • [refactor] Check field_dim mismatch in cgraph compilation and runtime (by Ailing Zhang)
  • [test] Check repeated arg names in cgraph (by Ailing Zhang)
  • [llvm] [refactor] (Decomp of #5251 6/n) Let ModuleToFunctionConverter support multiple modules (#5372) (by Lin Jiang)
  • [Lang] Make floor, ceil and round accept a dtype optional argument (#5307) (by Zhao Liang)
  • [refactor] Rename the confused needs_grad (#5359) (by Mingrui Zhang)
  • [autodiff] Support unary ops for forward mode (#5366) (by Mingrui Zhang)
  • [llvm] (Decomp of #5251 7/n) Change the way to record the time of offline cache (#5373) (by Lin Jiang)
  • [llvm] (Decomp of #5251 5/n) Add the parallel compilation worker to LlvmProgramImpl (#5364) (by Lin Jiang)
  • [gui] [test] Fix bug in test_ggui.py when some pc env do not surrport ggui (#5370) (by Mocki)
  • [Lang] Rename struct_class to dataclass (#5365) (by Zhao Liang)
  • [llvm] Drop code for llvm 15. (#5313) (by Xiang Li)
  • [llvm] [aot] Rewrite LLVM AOT tests with LlvmRuntimeExecutor (#5358) (by Zhanlue Yang)
  • [example] Avoid f64 type in simulation/initial_value_problem.py (#5355) (by Proton)
  • [ci] testing: add retention-days for broken wheels (#5326) (by Proton)
  • [test] (Decomp of #5251 4/n) Delete tests for AsyncTaichi (#5357) (by Lin Jiang)
  • [llvm] [refactor] (Decomp of #5251 2/n) Make modulegen a virtual function and let LLVMCompiledData replace ModuleGenValue (#5353) (by Lin Jiang)
  • [gui] Support exporting gif && video in GGUI (#5354) (by Mocki)
  • [autodiff] Handle field accessing by zero for forward mode (#5339) (by Mingrui Zhang)
  • [llvm] [refactor] (Decomp of #5251 3/n) Remove codegen from OffloadedTask and let it replace OffloadedTaskCacheData (#5356) (by Lin Jiang)
  • [refactor] Turn off stack traceback info by default (#5347) (by Ailing)
  • [refactor] (Decomp of #5251 1/n) Move ParallelExecutor out of async engine (#5351) (by Lin Jiang)
  • [Lang] Improve ti example so that users can choose which example to run by entering numbers. (#5265) (by Zhao Liang)
  • [gui] Add get_view_matrix() and get_projection_matrix() APIs for camera (#5345) (by Mocki)
  • [bug] Added warning messages for implicit type conversion for RangeFor boundaries (#5322) (by Zhanlue Yang)
  • [example] Fix simulation/waterwave.py:update race condition (#5346) (by Proton)
  • [Lang] [type] Refine SNode with quant 5/n: Rename bit_array to quant_array (#5344) (by Yi Xu)
  • [llvm] [aot] Added CGraph tests for LLVM backend (#5305) (by Zhanlue Yang)
  • [autodiff] [test] Add for-loop tests for forward mode (#5336) (by Mingrui Zhang)
  • [example] Lower example GUI resolution to fit buildbot display (#5337) (by Proton)
  • [build] [bug] Fix building on macOS 10.14 failed (#5332) (by PGZXB)
  • [llvm] [aot] Replaced LlvmProgramImpl with LlvmRuntimeExecutor for LlvmAotModuleLoader (#5330) (by Zhanlue Yang)
  • [AOT] Fixed certain crashes in C-API (#5335) (by PENGUINLIONG)
  • [Lang] [type] Make bit_vectorize a parameter of ti.loop_config (#5334) (by Yi Xu)
  • [autodiff] Skip store forwarding to keep the GlobalLoadStmt alive (#5315) (by Mingrui Zhang)
  • [llvm] [aot] RModified ModuleToFunctionConverter to use LlvmRuntimeExecutor instead of LlvmProgramImpl (#5328) (by Zhanlue Yang)
  • [llvm] Changed LlvmProgramImpl to save cache_data_ with unique_ptr instead of raw object (#5329) (by Zhanlue Yang)
  • [Lang] [type] Refine SNode with quant 3/n: Turn bit_vectorize into an on/off switch (#5331) (by Yi Xu)
  • [misc] Fix a few compilation warnings (#5325) (by yekuang)
  • [bug] Accept numpy integers in ndrange (#5245) (#5323) (by Proton)
  • [misc] Implement cache file cleaning (#5310) (by PGZXB)
  • Fixed C-AP build on Android (#5321) (by PENGUINLIONG)
  • [AOT] Save AOT module artifacts as zip archive (#5316) (by PENGUINLIONG)
  • [llvm] [aot] Added LLVM backend support for Compute Graph (#5294) (by Zhanlue Yang)
  • [AOT] Unity native plugin interfaces (#5273) (by PENGUINLIONG)
  • [autodiff] Check not placed field.grad when needs_grad = True (#5295) (by Mingrui Zhang)
  • [autodiff] Fix alloca block and add control flow test case for forward mode (#5301) (by Mingrui Zhang)
  • [refactor] Synchronize should always be called in non-async mode (#5302) (by Ailing)
  • [Lang] Add errror message for missing init call (#5280) (by Zhao Liang)
  • Update prtags.json (#5304) (by Bob Cao)
  • [refactor] Get rid ndarray host accessor kernels (by Ailing Zhang)
  • [refactor] Use device api for CPU/CUDA ndarray (by Ailing Zhang)
  • [refactor] Switch to using staging buffer for metal/vulkan/opengl (by Ailing Zhang)
  • [llvm] Use LlvmProgramImpl::cache_data_ to store compiled kernel info (#5290) (by Zhanlue Yang)
  • [opengl] Texture support in OpenGL (#5296) (by Bob Cao)
  • [build] [refactor] Cleanup backends folder and rename to RHI (#5288) (by Bo Qiao)
  • [Lang] Fix fractal gui close warning (#5281) (by Zhao Liang)
  • [autodiff] [test] Add atomic test for forward autodiff (#5286) (by Mingrui Zhang)
  • [dx11] Fix DX backend with new runtime & Better D3D11 buffer handling (#5244) (by Bob Cao)
  • [autodiff] Set default seed only for scalar parameter to avoid silent unexpected results (#5287) (by Mingrui Zhang)
  • test (#5292) (by Ailing)
  • [AOT] Added C-API for on-device memory copy (#5271) (by PENGUINLIONG)
  • [Doc] Fix typos (#5283) (by Kian-Meng Ang)
  • [autodiff] Support control flow for forward mode (by mingrui)
  • [autodiff] Support for-loop and mutation for forward mode (by mingrui)
  • [autodiff] Refactor dual field allocation (by mingrui)
  • [AOT] Refactor C-API codegen (#5272) (by PENGUINLIONG)
  • Update README.md (#5279) (by Taichi contributor)
  • [metal] Support memcpy_internal via buffer_copy (#5268) (by Ailing)
  • [bug] Fix missing old but useful metadata in offline cache (#5267) (by PGZXB)
  • [Lang] [type] Refine SNode with quant 2/n: Enable struct for on bit_array with bit_vectorize off (#5253) (by Yi Xu)
  • [Doc] Update dev_install.md (#5266) (by Vissidarte-Herman)
  • [build] [bug] Fix dependency for opengl_rhi target (by Bo Qiao)
  • Update fallback order, move opengl behind Vulkan (#5257) (by Bob Cao)
  • [opengl] Move OpenGL backend onto Gfx runtime (#5246) (by Bob Cao)
  • [build] [refactor] Move LLVM source files to target locations (#5254) (by Bo Qiao)
  • [bug] Fixed misuse of std::forward (#5237) (by Zhanlue Yang)
  • [AOT] Added safety checks to prevent hard crashes on failure (#5249) (by PENGUINLIONG)
  • [build] [refactor] Move shaders source files to runtime (#5247) (by Bo Qiao)
  • [example] Fix diff_sph example with --train (#5242) (by Mingrui Zhang)
  • [misc] Add filename option to ti.tools.VideoManager. (#5219) (by Qian Bao)
  • [bug] Throw exceptions when ndrange gets non-integral arguments (#5245) (by Mike He)
  • [build] [refactor] Move wasm and dx11 source files to target locations (#5235) (by Bo Qiao)
  • [type] [bug] Refine SNode with quant 1/n: Fix (atomic_)set_mask_b##N (#5238) (by Yi Xu)
  • [lang] 1d/3d texture support (#5233) (by Bob Cao)
  • [vulkan] Fix OpBranch for reversed RangeForStmt (#5241) (by Mingrui Zhang)
  • [build] Fix -Werror errors for TI_WITH_CUDA_TOOLKIT=ON (#5133) (#5216) (by Proton)
  • [ci] Enable pylint on examples (#5222) (by Proton)
  • [llvm] [aot] Split LlvmRuntimeExecutor from LlvmProgramImpl (#5207) (by Zhanlue Yang)
  • [type] [refactor] Decouple quant from SNode 3/n: Extend bit pointers (#5232) (by Yi Xu)
  • [vulkan] Codegen & runtime improvements (#5213) (by Bob Cao)
  • [gui] Fix the device memory leak when GGUI terminates (by Ailing Zhang)
  • [gui] Let gui and renderer manage the resource they own (by Ailing Zhang)
  • [AOT] Unity language binding generator (#5204) (by PENGUINLIONG)
  • [type] [refactor] Decouple quant from SNode 2/n: Remove physical_type from QuantIntType (#5223) (by Yi Xu)
  • [type] [refactor] Decouple quant from SNode 1/n: Add BitStructTypeBuilder (#5209) (by Yi Xu)
  • [build] [refactor] Move metal source files to target locations (#5208) (by Bo Qiao)
  • [lang] Export a few types from the share library (#5220) (by yekuang)
  • [llvm] [refactor] LLVMProgramImpl code clean up: part-5 (#5197) (by Zhanlue Yang)
  • [spirv] Fixed OpLoad with physical address (#5212) (by PENGUINLIONG)
  • [wip] Enable full wheel build when TI_EXPORT_CORE is on (#5211) (by Ailing)
  • [llvm] [refactor] LLVMProgramImpl code clean up: part-4 (#5189) (by Zhanlue Yang)
  • Move spdlog include to profiler.cpp (#5210) (by Ailing)
  • Fix ti gallery command bug (#5196) (by Zhao Liang)
  • [misc] Improve TI_STATIC_ASSERT compatibility (#5205) (by Yuanming Hu)
  • [llvm] [refactor] LLVMProgramImpl code clean up: part-3 (#5188) (by Zhanlue Yang)
  • Fixed C-API provision (#5203) (by PENGUINLIONG)
  • [lang] Improve error message when literal val is out of range of default dtype (#5191) (by Ailing)
  • [Lang] [ir] Refactor indexing expressions in AST & enforce integer indices (#5138) (by daylily)
  • Remove stale coverage from README.md (#5202) (by yekuang)
  • [ci] Slim cpu build image (#5198) (by Proton)
  • [build] [refactor] Move opengl source files to target locations (#5200) (by Bo Qiao)
  • [example] Fix dtype for metal backend and enforce vulkan (#5201) (by Mingrui Zhang)
  • [Doc] Updated README command lines (#5199) (by Vissidarte-Herman)
  • [llvm] [refactor] LLVMProgramImpl code clean up: part-2 (#5187) (by Zhanlue Yang)
  • [AOT] Support Matrix/Vector as graph arguments (#5165) (by Haidong Lan)
  • [refactor] Enable adaptive block_dim selection for CPU backend (#5190) (by Bo Qiao)
  • [Doc] Modify compilation warnings (#5180) (by Olinaaaloompa)
  • [ci] Save wheel to artifact when test fails (#5186) (by Proton)
  • [gui] Detailed error message when GGUI is not available (#5164) (by Proton)
  • [ci] Run C++ tests on Windows (#5176) (by Proton)
  • [lang] Texture support 3/n (Python changes) (#5174) (by Bob Cao)
  • [llvm] [refactor] LLVMProgramImpl code clean up: part-1 (#5181) (by Zhanlue Yang)
  • [AOT] Implementation of Taichi Runtime C-API (#5168) (by PENGUINLIONG)
  • [refactor] [autodiff] Clean redundant compiled functions and refactor kernel key (#5178) (by Mingrui Zhang)
  • [doc] Add badge on README.md (#5177) (by yanqingzhang)
  • [lang] Texture support 2/n (SPIR-V backend & runtime changes) (#5159) (by Bob Cao)
  • [build] Export cmake config to ease clients usage in Cmake (#5162) (by Bo Qiao)
  • [refactor] [autodiff] Refactor autodiff api and add corresponding tests (#5175) (by Mingrui Zhang)
  • [aot] [llvm] LLVM AOT Field part-4: Added AOT tests for Fields - CUDA backend (#5124) (by Zhanlue Yang)
  • [type] [refactor] Consistently use quant_xxx in quant-related names (#5166) (by Yi Xu)
  • [cuda] Disable reduction in non-full warps (#5161) (by Bob Cao)
  • [autodiff] Support basic operations for forward mode autodiff (by mingrui)
  • [autodiff] Add a context manager for forward mode autodiff (by mingrui)
  • [AOT] C-APIs for Taichi runtime distribution (#5150) (by PENGUINLIONG)
  • [cli] Improve user interface for CLI command ti example (#5153) (by Zhao Liang)
  • [Doc] Updated odop.md, removing obsolete information (#5163) (by Vissidarte-Herman)
  • [autodiff] [refactor] Refactor autodiff tape api and TapeImpl (#5154) (by Mingrui Zhang)
  • [type] [refactor] Separate CustomFixedType from CustomFloatType (#5149) (by Yi Xu)
  • [ui] Properlly fix UTF-8 title string by converting to UTF16 (#5155) (by Bob Cao)
  • [aot] [llvm] LLVM AOT Field #3: Added AOT tests for Fields - CPU backend (#5121) (by Zhanlue Yang)
  • Bump version to v1.0.4 (#5157) (by Taichi Gardener)
  • [lang] Texture support 1/n (Context & Programs) (#5139) (by Bob Cao)
taichi - v1.0.3

Published by github-actions[bot] over 2 years ago

Highlights:

  • Aot module
    • Support importing external Vulkan buffers (#5020) (by PENGUINLIONG)
    • Supported inclusion of taichi as subdirectory for AOT modules (#5007) (by PENGUINLIONG)
  • Bug fixes
    • Fix frontend type check for reading a whole bit_struct (#5027) (by Yi Xu)
    • Remove redundant AllocStmt when lowering FrontendWhileStmt (#4870) (by Zhanlue Yang)
  • Build system
    • Improve Windows build script (#4955) (by PENGUINLIONG)
    • Improved building on Windows (#4925) (by PENGUINLIONG)
    • Define Cmake OpenGL runtime target (#4887) (by Bo Qiao)
    • Use keywords instead of plain target_link_libraries CMake (#4864) (by Bo Qiao)
    • Define runtime build target (#4838) (by Bo Qiao)
    • Switch to scikit-build as the build backend (#4624) (by Frost Ming)
  • Documentation
    • Improve ODOP doc structure (#5089) (by Yi Xu)
    • Add documentation of Taichi Struct Classes. (#5075) (by bsavery)
    • Updated type system (#5054) (by Vissidarte-Herman)
    • Branding updates. Also tests netlify. (#4994) (by Vissidarte-Herman)
    • Fix netlify cache & sync doc without pr content (#5003) (by Justin)
    • Update trouble shooting URL in bug report template (#4988) (by Haidong Lan)
    • Updated URL (#4990) (by Vissidarte-Herman)
    • Fix docs deploy netlify test configuration (#4991) (by Justin)
    • Updated relative path (#4929) (by Vissidarte-Herman)
    • Updated broken links (#4912) (by Vissidarte-Herman)
    • Updated links that may break. (#4874) (by Vissidarte-Herman)
    • Add limitation about TLS optimization (#4877) (by Ailing)
  • Examples
    • Fix block_dim warning in ggui (#5128) (by Zhao Liang)
    • Update visual effects of mass_spring_3d_ggui.py (#5081) (by Zhao Liang)
    • Update mass_spring_3d_ggui.py to v2 (#3879) (by Alex Brown)
  • Language and syntax
    • Add more initialization routines for glsl matrix types (#5069) (by Zhao Liang)
    • Support constructing vector and matrix ndarray from ti.ndarray() (by ailzhang)
    • Disallow reading a whole bit_struct (#5061) (by Yi Xu)
    • Struct Classes implementation (#4989) (by bsavery)
    • Add short-circuit if-then-else operator (#5022) (by daylily)
    • Build sparse matrix from ndarray (#4841) (by pengyu)
    • Fix potential precision bug when using math vector and matrix types (#5032) (by Zhao Liang)
    • Refactor quant type definition APIs (#5036) (by Yi Xu)
    • Fix parameter name 'range' for ti.types.quant.fixed (#5006) (by Yi Xu)
    • Refactor quantized_types module and make quant APIs public (#4985) (by Yi Xu)
    • Add more functions to math module (#4939) (by Zhao Liang)
    • Support sparse matrix datatype and storage format configuration (#4673) (by pengyu)
    • Copy-free interaction between Taichi and PaddlePaddle (#4886) (by 0xzhang)
  • LLVM backend (CPU and CUDA)
    • Add AOT builder and loader (#5013) (by yekuang)
  • Metal backend
    • Support Ndarray (#4720) (by yekuang)
  • RFC
    • AOT for all SNodes (#4806) (by yekuang)
  • SIMT programming
    • Add match_all warp intrinsics (#4961) (by Zeyu Li)
    • Add match_any warp intrinsics (#4921) (by Zeyu Li)
    • Add uni_sync warp intrinsics (#4927) (by 0xzhang)
    • Add activemask warp intrinsics (#4918) (by Zeyu Li)
    • Add syncwarp warp intrinsics (#4917) (by Zeyu Li)
  • Vulkan backend
    • Fixed vulkan backend crash on AOT examples (#5047) (by PENGUINLIONG)
  • GitHub Actions/Workflows
    • Update release_test.sh (#4960) (by Chuandong Yan)

Full changelog:

  • [aot] [llvm] LLVM AOT Field #2: Updated LLVM AOTModuleLoader & AOTModuleBuilder to support Fields (#5120) (by Zhanlue Yang)
  • [type] [refactor] Misc improvements to quant codegen (#5129) (by Yi Xu)
  • [ci] Enable yapf and isort on example files (#5140) (by Ailing)
  • [Example] Fix block_dim warning in ggui (#5128) (by Zhao Liang)
  • fix mass_spring_3d_ggui backend (#5127) (by Zhao Liang)
  • [lang] Texture support 0/n: IR changes (#5134) (by Bob Cao)
  • Editorial update (#5119) (by Olinaaaloompa)
  • [aot] [llvm] LLVM AOT Field #1: Adjust serialization/deserialization logics for FieldCacheData (#5111) (by Zhanlue Yang)
  • [aot][bug] Use cached compiled kernel pointer when it's added to graph (#5122) (by Ailing)
  • [aot] [llvm] LLVM AOT Field #0: Implemented FieldCacheData & refactored initialize_llvm_runtime_snodes() (#5108) (by Zhanlue Yang)
  • [autodiff] Add forward mode pipeline for autodiff pass (#5098) (by Mingrui Zhang)
  • [build] [refactor] Move Vulkan runtime out of backends dir (#5106) (by Bo Qiao)
  • [bug] Fix build without llvm backend crash (#5113) (by Bo Qiao)
  • [type] [llvm] [refactor] Fix function names in codegen_llvm_quant (#5115) (by Yi Xu)
  • [llvm] [refactor] Replace cast_int() with LLVM native integer cast (#5110) (by Yi Xu)
  • [type] [refactor] Remove redundant promotion for custom int in type_check (#5102) (by Yi Xu)
  • [Example] Update visual effects of mass_spring_3d_ggui.py (#5081) (by Zhao Liang)
  • [test] Save mpm88 graph in python and load in C++ test. (#5104) (by Ailing)
  • [llvm] [refactor] Move load_bit_pointer() to CodeGenLLVM (#5099) (by Yi Xu)
  • [refactor] Remove ndarray element shape from extra arg buffer (#5100) (by Haidong Lan)
  • [refactor] Update Ndarray constructor used in AOT runtime. (#5095) (by Ailing)
  • clean hidden override functions (#5097) (by Mingrui Zhang)
  • [llvm] [aot] CUDA-AOT PR #2: Implemented AOTModuleLoader & AOTModuleBuilder for LLVM-CUDA backend (#5087) (by Zhanlue Yang)
  • [Doc] Improve ODOP doc structure (#5089) (by Yi Xu)
  • Use pre-calculated runtime size array for gfx runtime. (#5094) (by Haidong Lan)
  • [bug] Minor fix for ndarray element_shape in graph mode (#5093) (by Ailing)
  • [llvm] [refactor] Use LLVM native atomic ops if possible (#5091) (by Yi Xu)
  • [autodiff] Extract shared components for reverse and forward mode (#5088) (by Mingrui Zhang)
  • [llvm] [aot] Add LLVM-CPU AOT tests (#5079) (by Zhanlue Yang)
  • [Doc] Add documentation of Taichi Struct Classes. (#5075) (by bsavery)
  • [build] [refactor] Change CMake global include_directories to target based function (#5082) (by Bo Qiao)
  • [autodiff] Allocate dual and adjoint snode (#5083) (by Mingrui Zhang)
  • [refactor] Make sure Ndarray shape is field shape (#5085) (by Ailing)
  • [llvm] [refactor] Merge AtomicOpStmt codegen in CPU and CUDA backends (#5086) (by Yi Xu)
  • [llvm] [aot] CUDA-AOT PR #1: Extracted common logics from CPUAotModuleImpl into LLVMAotModule (#5072) (by Zhanlue Yang)
  • [infra] Refactor Vulkan runtime into true Common Runtime (#5058) (by Bob Cao)
  • [refactor] Correctly set ndarray element_size and nelement (#5080) (by Ailing)
  • [cuda] [simt] Add assertions for warp intrinsics on old GPUs (#5077) (by Bo Qiao)
  • [Lang] Add more initialization routines for glsl matrix types (#5069) (by Zhao Liang)
  • [spirv] Specialize element shape for spirv codegen. (#5068) (by Haidong Lan)
  • [llvm] Specialize element shape for LLVM backend (#5071) (by Haidong Lan)
  • [doc] Fix broken link for github action status badge (#5076) (by Ailing)
  • [Example] Update mass_spring_3d_ggui.py to v2 (#3879) (by Alex Brown)
  • [refactor] Resolve comments from #5065 (#5074) (by Ailing)
  • [Lang] Support constructing vector and matrix ndarray from ti.ndarray() (by ailzhang)
  • [refactor] Pass element_shape and layout to C++ Ndarray (by ailzhang)
  • [refactor] Specialized Ndarray Type is (element_type, shape, layout) (by ailzhang)
  • [aot] [CUDA-AOT PR #0] Refactored compile_module_to_executable() to CUDAModuleToFunctionConverter (#5070) (by Zhanlue Yang)
  • [refactor] Split GraphBuilder out of Graph class (#5064) (by Ailing)
  • [build] [bug] Ensure the assets folder is copied to the project directory (#5063) (by Frost Ming)
  • [bug] Remove operator ! for Expr (#5062) (by Yi Xu)
  • [Lang] [type] Disallow reading a whole bit_struct (#5061) (by Yi Xu)
  • [Lang] Struct Classes implementation (#4989) (by bsavery)
  • [Lang] [ir] Add short-circuit if-then-else operator (#5022) (by daylily)
  • [bug] Ndarray type should include primitive dtype as well (#5052) (by Ailing)
  • [Doc] Updated type system (#5054) (by Vissidarte-Herman)
  • [bug] Added type promotion support for atan2 (#5037) (by Zhanlue Yang)
  • [Lang] Build sparse matrix from ndarray (#4841) (by pengyu)
  • Set host_write to false for opengl ndarray (#5038) (by Ailing)
  • [ci] Run cpp tests via run_tests.py (#5035) (by yekuang)
  • Exit CI builds when download of prebuilt packages fails (#5043) (by PENGUINLIONG)
  • [Vulkan] Fixed vulkan backend crash on AOT examples (#5047) (by PENGUINLIONG)
  • [Lang] Fix potential precision bug when using math vector and matrix types (#5032) (by Zhao Liang)
  • [Metal] Support Ndarray (#4720) (by yekuang)
  • [Lang] [type] Refactor quant type definition APIs (#5036) (by Yi Xu)
  • [aot] Bind graph APIs to python and add mpm88 example (#5034) (by Ailing)
  • [aot] Move ArgKind as first argument in Arg class (by ailzhang)
  • [aot] Serialize built graph, deserialize and run. (by ailzhang)
  • [ci] Disable win cpu docker job test (#5033) (by Bo Qiao)
  • [doc] Update OS names (#5030) (by Bo Qiao)
  • fix fast_gui rgba bug (#5031) (by Zhao Liang)
  • [Bug] [type] Fix frontend type check for reading a whole bit_struct (#5027) (by Yi Xu)
  • [AOT] Support importing external Vulkan buffers (#5020) (by PENGUINLIONG)
  • [SIMT] Add match_all warp intrinsics (#4961) (by Zeyu Li)
  • [bug] Revert freeing ndarray memory when python GC triggers (#5019) (by Ailing)
  • [ci] Fix nightly macos (#5018) (by Bo Qiao)
  • [Llvm] Add AOT builder and loader (#5013) (by yekuang)
  • [aot] Build and run graph without serialization (by Ailing Zhang)
  • [test] Unify kernel setup for ndarray related tests (by Ailing Zhang)
  • [ci] [build] Enable ccache for windows docker (#5001) (by Frost Ming)
  • [refactor] Move get ndarray data ptr to program (#5012) (by pengyu)
  • [bug] Fixed numerical error for Atomic-Sub between unsigned values with different number of bits (#5011) (by Zhanlue Yang)
  • [llvm] Add serializable LlvmLaunchArgInfo (#4992) (by yekuang)
  • [doc] Update community section (#4943) (by yanqingzhang)
  • [SIMT] Add match_any warp intrinsics (#4921) (by Zeyu Li)
  • [Lang] [type] Fix parameter name 'range' for ti.types.quant.fixed (#5006) (by Yi Xu)
  • [misc] Version bump: v1.0.2 -> v1.0.3 (#5008) (by Haidong Lan)
  • [AOT] Supported inclusion of taichi as subdirectory for AOT modules (#5007) (by PENGUINLIONG)
  • [Doc] Branding updates. Also tests netlify. (#4994) (by Vissidarte-Herman)
  • [refactor] Get rid of data_ptr_ in Ndarray (by Ailing Zhang)
  • [refactor] Move ndarray fast fill methods to Program (by Ailing Zhang)
  • [refactor] Free ndarray's memory when python GC triggers (by Ailing Zhang)
  • [refactor] Construct ndarray from existing DeviceAllocation. (by Ailing Zhang)
  • [test] Add test for Ndarray from DeviceAllocation (by Ailing Zhang)
  • [refactor] Program owns allocated ndarrays. (by Ailing Zhang)
  • [Doc] Fix netlify cache & sync doc without pr content (#5003) (by Justin)
  • [test] Fix a few mis-configured ndarray tests (#5000) (by Ailing)
  • Update README.md (by Vissidarte-Herman)
  • [Lang] [type] Refactor quantized_types module and make quant APIs public (#4985) (by Yi Xu)
  • [Doc] Update trouble shooting URL in bug report template (#4988) (by Haidong Lan)
  • [Doc] Updated URL (#4990) (by Vissidarte-Herman)
  • [Doc] Fix docs deploy netlify test configuration (#4991) (by Justin)
  • [llvm] Use serializer for LLVM cache (#4982) (by yekuang)
  • Provision of prebuilt LLVM 10 for VS2022 (#4987) (by PENGUINLIONG)
  • [Workflow] Update release_test.sh (#4960) (by Chuandong Yan)
  • [cuda] Add block and grid level intrinsic for cuda backend (#4977) (by YuZhang)
  • [bug] Fix infinite recursion of get_offline_cache_key_of_snode_impl() (#4983) (by PGZXB)
  • [misc] Add ASTSerializer::visit(ReferenceExpression *) (#4984) (by PGZXB)
  • [llvm] Support both BC and LL cache format (#4979) (by yekuang)
  • [refactor] Improve serializer and cleanup utils (#4980) (by yekuang)
  • [Build] Improve Windows build script (#4955) (by PENGUINLIONG)
  • [llvm] Make cache writer support BC format (#4978) (by yekuang)
  • [ci] [build] Containerize Windows CPU build and test (#4933) (by Bo Qiao)
  • [llvm] Make codegen produce static llvm::Module (#4975) (by yekuang)
  • [test] Add an ndarray test in C++. (#4972) (by Ailing)
  • [build] Fixed Ilegal Instruction Error when importing PaddlePaddle module (#4969) (by Zhanlue Yang)
  • [llvm] Create ModuleToFunctionConverter (#4962) (by yekuang)
  • [bug] [simt] Fix the problem that some intrinsics are never called (#4957) (by Yi Xu)
  • [vulkan] Set kApiVersion to VK_API_VERSION_1_3 (#4970) (by Haidong Lan)
  • [ci] Add new buildbot with latest driver for Linux/Vulkan test (#4953) (by Bo Qiao)
  • [RFC] AOT for all SNodes (#4806) (by yekuang)
  • [llvm] Move cache directory to dump() (#4963) (by yekuang)
  • [lang] Add reference type support on real functions (#4889) (by Lin Jiang)
  • [refactor] Some renamings (#4959) (by yekuang)
  • [refactor] Add ArrayMetadata to store the array runtime size (#4950) (by yekuang)
  • [lang] [bug] Implement Expression serializing and fix some bugs (#4931) (by PGZXB)
  • [Lang] Add more functions to math module (#4939) (by Zhao Liang)
  • [Build] Improved building on Windows (#4925) (by PENGUINLIONG)
  • [ci] Fix Nightly (#4948) (by Bo Qiao)
  • [build] Limit -Werror to Clang-compiler only (#4947) (by Zhanlue Yang)
  • [refactor] [llvm] Remove struct_compiler_ as a member variable (#4945) (by yekuang)
  • [build] Turned off -Werror temporarily for issues with performance-bot (#4946) (by Zhanlue Yang)
  • [refactor] Remove unused snode_trees in ProgramImpl interface (#4942) (by yekuang)
  • [doc] Updated documentations for implicit type casting rules (#4885) (by Zhanlue Yang)
  • [build] Turn on -Werror on Linux and Mac platforms (#4928) (by Zhanlue Yang)
  • [build] Enable -Werror on Linux & Mac (#4941) (by Zhanlue Yang)
  • [SIMT] Add uni_sync warp intrinsics (#4927) (by 0xzhang)
  • [lang] Fix type check warnings for ti.Mesh (#4930) (by Chang Yu)
  • [Lang] Support sparse matrix datatype and storage format configuration (#4673) (by pengyu)
  • [Doc] Updated relative path (#4929) (by Vissidarte-Herman)
  • [refactor] Simplify Matrix's initializer (#4923) (by yekuang)
  • [build] Warning Suppression PR #4: Fixed warnings with MacOS (#4926) (by Zhanlue Yang)
  • [build] Warning Suppression PR #3: Eliminate warnings from third-party headers (#4920) (by Zhanlue Yang)
  • [SIMT] Add activemask warp intrinsics (#4918) (by Zeyu Li)
  • [build] Warning Suppression PR #1: Turned on -Wno-ignored-attributes & Removed unused functions (#4916) (by Zhanlue Yang)
  • [refactor] Create MatrixImpl to differentiate Taichi and Python scopes (#4853) (by yekuang)
  • [SIMT] Add syncwarp warp intrinsics (#4917) (by Zeyu Li)
  • [build] Warning Suppression PR #2: Fixed codebase warnings (#4909) (by Zhanlue Yang)
  • [test] Exit on error during Paddle windows test (#4910) (by Bo Qiao)
  • [Doc] Updated broken links (#4912) (by Vissidarte-Herman)
  • remove debug print (#4883) (by yixu)
  • [test] Cancel tests for Paddle on GPU (#4914) (by 0xzhang)
  • [Lang] [test] Copy-free interaction between Taichi and PaddlePaddle (#4886) (by 0xzhang)
  • Use Ninja generator on Windows and skip generator test (#4896) (by Frost Ming)
  • [vulkan] Add new VMA vulkan functions. (#4893) (by Bob Cao)
  • [vulkan] Fix typo for waitSemaphoreCount (#4892) (by Gabriel H)
  • [Build] [refactor] Define Cmake OpenGL runtime target (#4887) (by Bo Qiao)
  • [aot] [vulkan] Expose symbols for AOT (#4879) (by yekuang)
  • [bug] Fixed type promotion rule for bit-shift operations (#4884) (by Zhanlue Yang)
  • [Build] [refactor] Use keywords instead of plain target_link_libraries CMake (#4864) (by Bo Qiao)
  • [metal] Migrate runtime's MTLBuffer allocation to unified device API (#4865) (by yekuang)
  • [error] [lang] Improved error messages for illegal slicing or indexing to ti.field (#4873) (by Zhanlue Yang)
  • [Doc] Updated links that may break. (#4874) (by Vissidarte-Herman)
  • [metal] Complete Device API (#4862) (by yekuang)
  • [vulkan] Device API explicit semaphores (#4852) (by Bob Cao)
  • [build] Change the library output dir for export core (#4880) (by Frost Ming)
  • [refactor] Add ASTSerializer and use it to generate offline-cache-key (#4863) (by PGZXB)
  • [ci] Use the updated docker image for libtaichi_export_core (#4881) (by Bo Qiao)
  • [Doc] Add limitation about TLS optimization (#4877) (by Ailing)
  • [Build] [refactor] Define runtime build target (#4838) (by Bo Qiao)
  • [ci] Add libtaichi_export_core build for desktop in CI (#4871) (by Ailing)
  • [build] [bug] Fix a bug of skbuild that loses the root package_dir (#4875) (by Frost Ming)
  • [Bug] Remove redundant AllocStmt when lowering FrontendWhileStmt (#4870) (by Zhanlue Yang)
  • [misc] Bump version to v1.0.2 (#4867) (by Taichi Gardener)
  • [build] Install export core library to build dir (#4866) (by Frost Ming)
  • [Build] Switch to scikit-build as the build backend (#4624) (by Frost Ming)
taichi - v1.0.2

Published by github-actions[bot] over 2 years ago

Highlights:

The v1.0.2 release is a patch fix that improves Taichi's stability on multiple platforms, especially for GGUI and the Vulkan backend.

  • Bug fixes
    • Remove redundant AllocStmt when lowering FrontendWhileStmt (#4870) (by Zhanlue Yang)
  • Build system
    • Define Cmake OpenGL runtime target (#4887) (by Bo Qiao)
    • Use keywords instead of plain target_link_libraries CMake (#4864) (by Bo Qiao)
    • Define runtime build target (#4838) (by Bo Qiao)
    • Switch to scikit-build as the build backend (#4624) (by Frost Ming)
  • Documentation
    • Add limitation about TLS optimization (#4877) (by Ailing)

Full changelog:

  • [ci] Fix Nightly (#4948) (by Bo Qiao)
  • [ci] [build] Containerize Windows CPU build and test (#4933) (by Bo Qiao)
  • [vulkan] Set kApiVersion to VK_API_VERSION_1_3 (#4970) (by Haidong Lan)
  • [ci] Add new buildbot with latest driver for Linux/Vulkan test (#4953) (by Bo Qiao)
  • [vulkan] Add new VMA vulkan functions. (#4893) (by Bob Cao)
  • [vulkan] Fix typo for waitSemaphoreCount (#4892) (by Gabriel H)
  • [Build] [refactor] Define Cmake OpenGL runtime target (#4887) (by Bo Qiao)
  • [Build] [refactor] Use keywords instead of plain target_link_libraries CMake (#4864) (by Bo Qiao)
  • [vulkan] Device API explicit semaphores (#4852) (by Bob Cao)
  • [build] Change the library output dir for export core (#4880) (by Frost Ming)
  • [ci] Use the updated docker image for libtaichi_export_core (#4881) (by Bo Qiao)
  • [Doc] Add limitation about TLS optimization (#4877) (by Ailing)
  • [Build] [refactor] Define runtime build target (#4838) (by Bo Qiao)
  • [ci] Add libtaichi_export_core build for desktop in CI (#4871) (by Ailing)
  • [build] [bug] Fix a bug of skbuild that loses the root package_dir (#4875) (by Frost Ming)
  • [Bug] Remove redundant AllocStmt when lowering FrontendWhileStmt (#4870) (by Zhanlue Yang)
  • [misc] Bump version to v1.0.2 (#4867) (by Taichi Gardener)
  • [build] Install export core library to build dir (#4866) (by Frost Ming)
  • [Build] Switch to scikit-build as the build backend (#4624) (by Frost Ming)
taichi - v1.0.1

Published by github-actions[bot] over 2 years ago

Highlights:

  • Automatic differentiation
    • Implement ti.ad.no_grad to skip autograd (#4751) (by Shawn Yao)
  • Bug fixes
    • Fix and refactor type check for atomic ops (#4858) (by Yi Xu)
    • Fix and refactor type check for local stores (#4843) (by Yi Xu)
    • Fix implicit cast warning for global stores (#4834) (by Yi Xu)
  • Documentation
    • Updated URL (#4847) (by Vissidarte-Herman)
    • LLVM sparse runtime design doc (#4790) (by yekuang)
    • Proofread Getting started (#4682) (by Vissidarte-Herman)
    • Editorial review to fields (advanced) (#4686) (by Vissidarte-Herman)
    • Update docstring for ti.Mesh (#4818) (by Chang Yu)
    • Remove redundant semicolon in path (#4801) (by gaoxinge)
  • Error messages
    • Show warning when serialize=True is set on a struct for (#4844) (by Lin Jiang)
    • Provide source code info in warnings (#4840) (by Yi Xu)
  • Language and syntax
    • Add single character property for vector swizzle && test (#4845) (by Zhao Liang)
    • Remove obsolete vectypes class (#4831) (by LiangZhao)
    • Add support for keyword arguments (#4794) (by Lin Jiang)
    • Support swizzles on all Matrix/Vector types (#4828) (by yekuang)
    • Add 2d and 3d rotation functions to math module (#4822) (by Zhao Liang)
    • Walkaround Vulkan backend behavior which changes cwd on Mac (#4812) (by TiGeekMan)
    • Add mod function to math module (#4809) (by Zhao Liang)
    • Support in-place operator of ti.Matrix in python scope (#4799) (by Lin Jiang)
    • Move short-circuit boolean logic into AST-to-IR passes (#4580) (by daylily)
    • Promote output type of log, exp, and sqrt ops (#4622) (by Andrew Sun)
    • Fix integral type promotion rules (e.g., u8 + u8 now leads to u8 instead of i32) (#4789) (by Yuanming Hu)
    • Add basic complex arithmetic and add a mandelbrot example (#4780) (by Zhao Liang)
  • SIMT programming
    • Add shfl_down_f32 intrinsic. (#4819) (by Chun Cai)

Full changelog:

  • [gui] Avoid implicit type casts in staging_buffer (#4861) (by Yi Xu)
  • [lang] Add better error detection for swizzle patterens (#4860) (by yekuang)
  • [Bug] [ir] Fix and refactor type check for atomic ops (#4858) (by Yi Xu)
  • [Doc] Updated URL (#4847) (by Vissidarte-Herman)
  • [bug] Fix bug that building with TI_EXPORT_CORE:BOOL=ON failed (#4850) (by PGZXB)
  • [Error] Show warning when serialize=True is set on a struct for (#4844) (by Lin Jiang)
  • [lang] Group related Matrix methods closer (#4836) (by yekuang)
  • [Lang] Add single character property for vector swizzle && test (#4845) (by Zhao Liang)
  • [Bug] [ir] Fix and refactor type check for local stores (#4843) (by Yi Xu)
  • [Error] Provide source code info in warnings (#4840) (by Yi Xu)
  • [misc] Update pre-commit hooks (#4713) (by pre-commit-ci[bot])
  • [Bug] [ir] Fix implicit cast warning for global stores (#4834) (by Yi Xu)
  • [mesh] Remove link hints from ti.Mesh (#4825) (by yixu)
  • [Lang] Remove obsolete vectypes class (#4831) (by LiangZhao)
  • [doc] Fix doc link (#4835) (by yekuang)
  • [Doc] LLVM sparse runtime design doc (#4790) (by yekuang)
  • [Lang] Add support for keyword arguments (#4794) (by Lin Jiang)
  • [Lang] Support swizzles on all Matrix/Vector types (#4828) (by yekuang)
  • [test] Add simple test for offline-cache-key of compile-config (#4805) (by PGZXB)
  • [vulkan] Device API blending (#4815) (by Bob Cao)
  • [spirv] Fix int casts (#4814) (by Bob Cao)
  • [gui] Only call ImGui_ImplVulkan_Shutdown if it's initialized (#4827) (by Ailing)
  • [ci] Use a new PAT for project with org permission (#4826) (by Frost Ming)
  • [Lang] Add 2d and 3d rotation functions to math module (#4822) (by Zhao Liang)
  • [Doc] Proofread Getting started (#4682) (by Vissidarte-Herman)
  • [Doc] Editorial review to fields (advanced) (#4686) (by Vissidarte-Herman)
  • [bug] Fix bug that building with gcc9.4 will fail (#4823) (by PGZXB)
  • [SIMT] Add shfl_down_f32 intrinsic. (#4819) (by Chun Cai)
  • [workflow] Add issues to project when issue opened (#4816) (by Frost Ming)
  • [vulkan] Fix vulkan initialization on macOS with cpu backend (#4813) (by Bob Cao)
  • [Doc] [mesh] Update docstring for ti.Mesh (#4818) (by Chang Yu)
  • [vulkan] Fix Vulkan device score bug (#4803) (by Andrew Sun)
  • [Lang] Walkaround Vulkan backend behavior which changes cwd on Mac (#4812) (by TiGeekMan)
  • [misc] Add SNode to offline-cache key (#4716) (by PGZXB)
  • [Lang] Add mod function to math module (#4809) (by Zhao Liang)
  • [doc] Fix doc of running C++ tests (#4798) (by Yi Xu)
  • [Lang] Support in-place operator of ti.Matrix in python scope (#4799) (by Lin Jiang)
  • [Lang] [ir] Move short-circuit boolean logic into AST-to-IR passes (#4580) (by daylily)
  • [lang] Fix frontend type check for sqrt, log, exp (#4797) (by Yi Xu)
  • [Doc] Remove redundant semicolon in path (#4801) (by gaoxinge)
  • [Lang] [ir] Promote output type of log, exp, and sqrt ops (#4622) (by Andrew Sun)
  • [ci] Update ci images to use latest git (#4792) (by Bo Qiao)
  • [Lang] Fix integral type promotion rules (e.g., u8 + u8 now leads to u8 instead of i32) (#4789) (by Yuanming Hu)
  • [Lang] Add basic complex arithmetic and add a mandelbrot example (#4780) (by Zhao Liang)
  • Update index.md (#4791) (by Bob Cao)
  • [spirv] Add 16 bit float immediate number (#4787) (by Bob Cao)
  • [ci] Update ubuntu 18.04 image to use latest git (#4785) (by Frost Ming)
  • [lang] Store relations with 16-bit type (#4779) (by Chang Yu)
  • [Autodiff] Implement ti.ad.no_grad to skip autograd (#4751) (by Shawn Yao)
  • [misc] Remove some unnecessary attributes from offline-cache key of compile-config (#4770) (by PGZXB)
  • [doc] Update install instruction with "--upgrade" (#4775) (by Yuanming Hu)
  • Expose VboHelpers class (#4773) (by Ailing)
  • Bump version to v1.0.1 (#4774) (by Taichi Gardener)
  • [refactor] Merge Kernel.argument_names and argument_annotations (#4753) (by dongqi shen)
  • [dx11] Constant buffer binding and AtomicIncrement in RAND_STATE (#4650) (by quadpixels)
taichi - v1.0.0

Published by github-actions[bot] over 2 years ago

v1.0.0 was released on April 13, 2022.

Compatibility changes

License change

Taichi's license is changed from MIT to Apache-2.0 after a public vote in #4607.

Python 3.10 support

This release supports Python 3.10 on all supported operating systems (Windows, macOS, and Linux).

Manylinux2014-compatible wheels

Before v1.0.0, Taichi works only on Linux distributions that support glibc 2.27+ (for example Ubuntu 18.04+). As of v1.0.0, in addition to the normal Taichi wheels, Taichi provides the manylinux2014-compatible wheels to work on most modern Linux distributions, including CentOS 7.

  • The normal wheels support all backends; the incoming manylinux2014-compatible wheels support the CPU and CUDA backends only. Choose the wheels that work best for you.
  • If you encounter any issue when installing the wheels, try upgrading your pip to the latest version first.

Deprecations

  • This release deprecates ti.ext_arr() and uses ti.types.ndarray() instead. ti.types.ndarray() supports both Taichi Ndarrays and external arrays, for example NumPy arrays.
  • Taichi plans to drop support for Python 3.6 in the next minor release (v1.1.0). If you have any questions or concerns, please let us know at #4772.

New features

Non-Python deployment solution

By working together with OPPO US Research Center, Taichi delivers Taichi AOT, a solution for deploying kernels in non-Python environments, such as in mobile devices.

Compiled Taichi kernels can be saved from a Python process, then loaded and run by the provided C++ runtime library. With a set of APIs, your Python/Taichi code can be easily deployed in any C++ environment. We demonstrate the simplicity of this workflow by porting the implicit FEM (finite element method) demo released in v0.9.0 to an Android application. Download the Android package and find out what Taichi AOT has to offer! If you want to try out this solution, please also check out the taichi-aot-demo repo.

# In Python app.py
module = ti.aot.Module(ti.vulkan) 
module.add_kernel(my_kernel, template_args={'x': x})
module.save('my_app')

The following code snippet shows the C++ workflow for loading the compiled AOT modules.

// Initialize Vulkan program pipeline
taichi::lang::vulkan::VulkanDeviceCreator::Params evd_params;
evd_params.api_version = VK_API_VERSION_1_2;
auto embedded_device =
    std::make_unique<taichi::lang::vulkan::VulkanDeviceCreator>(evd_params);

std::vector<uint64_t> host_result_buffer;
host_result_buffer.resize(taichi_result_buffer_entries);
taichi::lang::vulkan::VkRuntime::Params params;
params.host_result_buffer = host_result_buffer.data();
params.device = embedded_device->device();
auto vulkan_runtime = std::make_unique<taichi::lang::vulkan::VkRuntime>(std::move(params));

// Load AOT module saved from Python
taichi::lang::vulkan::AotModuleParams aot_params{"my_app", vulkan_runtime.get()};
auto module = taichi::lang::aot::Module::load(taichi::Arch::vulkan, aot_params);
auto my_kernel = module->get_kernel("my_kernel");

// Allocate device buffer
taichi::lang::Device::AllocParams alloc_params;
alloc_params.host_write = true;
alloc_params.size = /*Ndarray size for `x`*/;
alloc_params.usage = taichi::lang::AllocUsage::Storage;
auto devalloc_x = embedded_device->device()->allocate_memory(alloc_params);

// Execute my_kernel without Python environment
taichi::lang::RuntimeContext host_ctx;
host_ctx.set_arg_devalloc(/*arg_id=*/0, devalloc_x, /*shape=*/{128}, /*element_shape=*/{3, 1});
my_kernel->launch(&host_ctx);

Note that Taichi only supports the Vulkan backend in the C++ runtime library. The Taichi team is working on supporting more backends.

Real functions (experimental)

All Taichi functions are inlined into the Taichi kernel during compile time. However, the kernel becomes lengthy and requires longer compile time if it has too many Taichi function calls. This becomes especially obvious if a Taichi function involves compile-time recursion. For example, the following code calculates the Fibonacci numbers recursively:

@ti.func
def fib_impl(n: ti.template()):
    if ti.static(n <= 0):
        return 0
    if ti.static(n == 1):
        return 1
    return fib_impl(n - 1) + fib_impl(n - 2)

@ti.kernel
def fibonacci(n: ti.template()):
    print(fib_impl(n))

In this code, fib_impl() recursively calls itself until n reaches 1 or 0. The total time of the calls to fib_impl() increases exponentially as n grows, so the length of the kernel also increases exponentially. When n reaches 25, it takes more than a minute to compile the kernel.

This release introduces "real function", a new type of Taichi function that compiles independently instead of being inlined into the kernel. It is an experimental feature and only supports scalar arguments and scalar return value for now.

You can use it by decorating the function with @ti.experimental.real_func. For example, the following is the real function version of the code above.

@ti.experimental.real_func
def fib_impl(n: ti.i32) -> ti.i32:
    if n <= 0:
        return 0
    if n == 1:
        return 1
    return fib_impl(n - 1) + fib_impl(n - 2)

@ti.kernel
def fibonacci(n: ti.i32):
    print(fib_impl(n))

The length of the kernel does not increase as n grows because the kernel only makes a call to the function instead of inlining the whole function. As a result, the code takes far less than a second to compile regardless of the value of n.

The main differences between a normal Taichi function and a real function are listed below:

  • You can write return statements in any part of a real function, while you cannot write return statements inside the scope of non-static if / for / while statements in a normal Taichi function.
  • A real function can be called recursively at runtime, while a normal Taichi function only supports compile-time recursion.
  • The return value and arguments of a real function must be type hinted, while the type hints are optional in a normal Taichi function.

Type annotations for literals

Previously, you cannot explicitly give a type to a literal. For example,

@ti.kernel
def foo():
    a = 2891336453  # i32 overflow (>2^31-1)

In the code snippet above, 2891336453 is first turned into a default integer type (ti.i32 if not changed). This causes an overflow. Starting from v1.0.0, you can write type annotations for literals:

@ti.kernel
def foo():
    a = ti.u32(2891336453)  # similar to 2891336453u in C

Top-level loop configurations

You can use ti.loop_config to control the behavior of the subsequent top-level for-loop. Available parameters are:

  • block_dim: Sets the number of threads in a block on GPU.
  • parallelize: Sets the number of threads to use on CPU.
  • serialize: If you set serialize to True, the for-loop runs serially, and you can write break statements inside it (Only applies on range/ndrange for-loops). Setting serialize to True Equals setting parallelize to 1.

Here are two examples:

@ti.kernel
def break_in_serial_for() -> ti.i32:
    a = 0
    ti.loop_config(serialize=True)
    for i in range(100):  # This loop runs serially
        a += i
        if i == 10:
            break
    return a

break_in_serial_for()  # returns 55
n = 128
val = ti.field(ti.i32, shape=n)

@ti.kernel
def fill():
    ti.loop_config(parallelize=8, block_dim=16)
    # If the kernel is run on the CPU backend, 8 threads will be used to run it
    # If the kernel is run on the CUDA backend, each block will have 16 threads
    for i in range(n):
        val[i] = i

math module

This release adds a math module to support GLSL-standard vector operations and to make it easier to port GLSL shader code to Taichi. For example, vector types, including vec2, vec3, vec4, mat2, mat3, and mat4, and functions, including mix(), clamp(), and smoothstep(), act similarly to their counterparts in GLSL. See the following examples:

Vector initialization and swizzling

You can use the rgba, xyzw, uvw properties to get and set vector entries:

import taichi.math as tm

@ti.kernel
def example():
    v = tm.vec3(1.0)  # (1.0, 1.0, 1.0)
    w = tm.vec4(0.0, 1.0, 2.0, 3.0)
    v.rgg += 1.0  # v = (2.0, 3.0, 1.0)
    w.zxy += tm.sin(v)

Matrix multiplication

Each Taichi vector is implemented as a column vector. Ensure that you put the the matrix before the vector in a matrix multiplication.

@ti.kernel
def example():
    M = ti.Matrix([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
    v = tm.vec3(1, 2, 3)
    w = (M @ v).xyz  # [1, 2, 3]

GLSL-standard functions

@ti.kernel
def example():
    v = tm.vec3(0., 1., 2.)
    w = tm.smoothstep(0.0, 1.0, v.xyz)
    w = tm.clamp(w, 0.2, 0.8)

CLI command ti gallery

This release introduces a CLI command ti gallery, allowing you to select and run Taichi examples in a pop-up window. To do so:

  1. Open a terminal:
ti gallery

A window pops up:

  1. Click to run any example in the pop-up window.
    The console prints the corresponding source code at the same time.

Improvements

Enhanced matrix type

As of v1.0.0, Taichi accepts matrix or vector types as parameters and return values. You can use ti.types.matrix or ti.types.vector as the type annotations.

Taichi also supports basic, read-only matrix slicing. Use the mat[:,:] syntax to quickly retrieve a specific portion of a matrix. See Slicings for more information.

The following code example shows how to get numbers in four corners of a 3x3 matrix mat:

import taichi as ti

ti.init()

@ti.kernel
def foo(mat: ti.types.matrix(3, 3, ti.i32)) -> ti.types.matrix(2, 2, ti.i32)
    corners = mat[::2, ::2]
    return corners
  
mat = ti.Matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
corners = foo(mat)  # [[1 3] [7 9]]

Note that in a slice, the lower bound, the upper bound, and the stride must be constant integers. If you want to use a variable index together with a slice, you should set ti.init(dynamic_index=True). For example:

import taichi as ti

ti.init(dynamic_index=True)

@ti.kernel
def foo(mat: ti.types.matrix(3, 3, ti.i32), ind: ti.i32) -> ti.types.matrix(3, 1, ti.i32):
    col = mat[:, ind]
    return col
  
mat = ti.Matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
col = foo(mat, 2)  # [3 6 9]

More flexible Autodiff: Kernel Simplicity Rule removed

Flexiblity is key to the user experience of an automatic-differentiation (AD) system. Before v1.0.0, Taichi AD system requires that a differentiable Taichi kernel only consist multiple simply nested for-loops (shown in task1 below). This was once called the Kernel Simplicity Rule (KSR). KSR prevents Taichi's users from writing differentiable kernels with multiple serial for-loops (shown in task2 below) or with a mixture of serial for-loop and non-for statements (shown in task3 below).

# OK: multiple simply nested for-loops
@ti.kernel
def task1():
    for i in range(2):
        for j in range(3):
            for k in range(3):
                y[None] += x[None]

# Error: multiple serial for-loops
@ti.kernel
def task2():
    for i in range(2):
        for j in range(3):
            y[None] += x[None]
        for j in range(3):
            y[None] += x[None]

# Error: a mixture of serial for-loop and non-for
@ti.kernel
def task3():
    for i in range(2):
        y[None] += x[None]
        for j in range(3):
            y[None] += x[None]

With KSR being removed from this release, code with different kinds of for-loops structures can be differentiated, as shown in the snippet below.

# OK: A complicated control flow that is still differentiable in Taichi
for j in range(2):
    for i in range(3):
        y[None] += x[None]
    for i in range(3):
        for ii in range(2):
            y[None] += x[None]
        for iii in range(2):
            y[None] += x[None]
            for iv in range(2):
                y[None] += x[None]
    for i in range(3):
        for ii in range(2):
            for iii in range(2):
                y[None] += x[None]

Taichi provides a demo to demonstrate how to implement a differentiable simulator using this enhanced Taichi AD system.

f-string support in an assert statement

This release supports including an f-string in an assert statement as an error message. You can include scalar variables in the f-string. See the example below:

import taichi as ti

ti.init(debug=True)

@ti.kernel
def assert_is_zero(n: ti.i32):
    assert n == 0, f"The number is {n}, not zero"

assert_is_zero(42)  # TaichiAssertionError: The number is 42, not zero

Note that the assert statement works only in debug mode.

Documentation changes

Taichi language reference

This release comes with the first version of the Taichi language specification, which attempts to provide an exhaustive description of the syntax and semantics of the Taichi language and makes a decent reference for Taichi's users and developers when they determine if a specific behavior is correct, buggy, or undefined.

API changes

Deprecated

Deprecated Replaced by
ti.ext_arr() ti.types.ndarray()

Full changelog

  • [example] Add diff sph demo (#4769) (by Mingrui Zhang)
  • [autodiff] Fix nullptr during adjoint codegen (#4771) (by Ye Kuang)
  • [bug] Fix kernel profiler on CPU backend (#4768) (by Lin Jiang)
  • [example] Fix taichi_dynamic example (#4767) (by Yi Xu)
  • [aot] Provide a convenient API to set devallocation as argument (#4762) (by Ailing)
  • [Lang] Deprecate ti.pyfunc (#4764) (by Lin Jiang)
  • [misc] Bump version to v1.0.0 (#4763) (by Yi Xu)
  • [SIMT] Add all_sync warp intrinsics (#4718) (by Yongmin Hu)
  • [doc] Taichi spec: calls, unary ops, binary ops and comparison (#4663) (by squarefk)
  • [SIMT] Add any_sync warp intrinsics (#4719) (by Yongmin Hu)
  • [Doc] Update community standard (#4759) (by notginger)
  • [Doc] Propose the RFC process (#4755) (by Ye Kuang)
  • [Doc] Fixed a broken link (#4758) (by Vissidarte-Herman)
  • [Doc] Taichi spec: conditional expressions and simple statements (#4728) (by Xiangyun Yang)
  • [bug] [lang] Let matrix initialize to the target type (#4744) (by Lin Jiang)
  • [ci] Fix ci nightly (#4754) (by Bo Qiao)
  • [doc] Taichi spec: compound statements, if, while (#4658) (by Lin Jiang)
  • [build] Simplify build command for android (#4752) (by Ailing)
  • [lang] Add PolygonMode enum for rasterizer (#4750) (by Ye Kuang)
  • [Aot] Support template args in AOT module add_kernel (#4748) (by Ye Kuang)
  • [lang] Support in-place operations on math vectors (#4738) (by Lin Jiang)
  • [ci] Add python 3.6 and 3.10 to nightly release (#4740) (by Bo Qiao)
  • [Android] Fix Android get height issue (#4743) (by Ye Kuang)
  • Updated logo (#4745) (by Vissidarte-Herman)
  • [Error] Raise an error when non-static condition is passed into ti.static_assert (#4735) (by Lin Jiang)
  • [Doc] Taichi spec: For (#4689) (by Lin Jiang)
  • [SIMT] [cuda] Use correct source lane offset for warp intrinsics (#4734) (by Bo Qiao)
  • [SIMT] Add shfl_xor_i32 warp intrinsics (#4642) (by Yongmin Hu)
  • [Bug] Fix warnings (#4730) (by Peng Yu)
  • [Lang] Add vector swizzle feature to math module (#4629) (by TiGeekMan)
  • [Doc] Taichi spec: static expressions (#4702) (by Lin Jiang)
  • [Doc] Taichi spec: assignment expressions (#4725) (by Xiangyun Yang)
  • [mac] Fix external_func test failures on arm backend (#4733) (by Ailing)
  • [doc] Fix deprecated tools APIs in docs, tests, and examples (#4729) (by Yi Xu)
  • [ci] Switch to self-hosted PyPI for nightly release (#4706) (by Bo Qiao)
  • [Doc] Taichi spec: boolean operations (#4724) (by Xiangyun Yang)
  • [doc] Fix deprecated profiler APIs in docs, tests, and examples (#4726) (by Yi Xu)
  • [spirv] Ext arr name should include arg id (#4727) (by Ailing)
  • [SIMT] Add shfl_sync_i32/f32 warp intrinsics (#4717) (by Yongmin Hu)
  • [Lang] Add 2x2/3x3 matrix solve with Guass elimination (#4634) (by Peng Yu)
  • [metal] Tweak Device to support Ndarray (#4721) (by Ye Kuang)
  • [build] Fix non x64 linux builds (#4715) (by Bob Cao)
  • [Doc] Fix 4 typos in doc (#4714) (by Jiayi Weng)
  • [simt] Subgroup reduction primitives (#4643) (by Bob Cao)
  • [misc] Remove legacy LICENSE.txt (#4708) (by Yi Xu)
  • [gui] Make GGUI VBO configurable for mesh (#4707) (by Yuheng Zou)
  • [Docs] Change License from MIT to Apache-2.0 (#4701) (by notginger)
  • [Doc] Update docstring for module misc (#4644) (by Zhao Liang)
  • [doc] Proofread GGUI.md (#4676) (by Vissidarte-Herman)
  • [refactor] Remove Expression::serialize and add ExpressionHumanFriendlyPrinter (#4657) (by PGZXB)
  • [Doc] Remove extension_libraries in doc site (#4696) (by LittleMan)
  • [Lang] Let assertion error message support f-string (#4700) (by Lin Jiang)
  • [Doc] Taichi spec: prims, attributes, subscriptions, slicings (#4697) (by Yi Xu)
  • [misc] Add compile-config to offline-cache key (#4681) (by PGZXB)
  • [refactor] Remove legacy usage of ext_arr/any_arr in codebase (#4698) (by Yi Xu)
  • [doc] Taichi spec: pass, return, break, and continue (#4656) (by Lin Jiang)
  • [bug] Fix chain assignment (#4695) (by Lin Jiang)
  • [Doc] Refactored GUI.md (#4672) (by Vissidarte-Herman)
  • [misc] Update linux version name (#4685) (by Jiasheng Zhang)
  • [bug] Fix ndrange when start > end (#4690) (by Lin Jiang)
  • [bug] Fix bugs in test_offline_cache.py (#4674) (by PGZXB)
  • [Doc] Fix gif link (#4694) (by Ye Kuang)
  • [Lang] Add math module to support glsl-style functions (#4683) (by LittleMan)
  • [Doc] Editorial updates (#4688) (by Vissidarte-Herman)
  • Editorial updates (#4687) (by Vissidarte-Herman)
  • [ci] [windows] Add Dockerfile for Windows build and test (CPU) (#4667) (by Bo Qiao)
  • [Doc] Taichi spec: list and dictionary displays (#4665) (by Yi Xu)
  • [CUDA] Fix the fp32 to fp64 promotion due to incorrect fmax/fmin call (#4664) (by Haidong Lan)
  • [misc] Temporarily disable a flaky test (#4669) (by Yi Xu)
  • [bug] Fix void return (#4654) (by Lin Jiang)
  • [Workflow] Use pre-commit hooks to check codes (#4633) (by Frost Ming)
  • [SIMT] Add ballot_sync warp intrinsics (#4641) (by Wimaxs)
  • [refactor] [cuda] Refactor offline-cache and support it on arch=cuda (#4600) (by PGZXB)
  • [Error] [doc] Add TaichiAssertionError and add assert to the lang spec (#4649) (by Lin Jiang)
  • [Doc] Taichi spec: parenthesized forms; expression lists (#4653) (by Yi Xu)
  • [Doc] Updated definition of a 0D field. (#4651) (by Vissidarte-Herman)
  • [Doc] Taichi spec: variables and scope; atoms; names; literals (#4621) (by Yi Xu)
  • [doc] Fix broken links and update docs. (#4647) (by Chengchen(Rex) Wang)
  • [Bug] Fix broken links (#4646) (by Peng Yu)
  • [Doc] Refactored field.md (#4618) (by Vissidarte-Herman)
  • [gui] Allow to configure the texture data type (#4630) (by Gabriel H)
  • [vulkan] Fixes the string comparison when querying extensions (#4638) (by Bob Cao)
  • [doc] Add docstring for ti.loop_config (#4625) (by Lin Jiang)
  • [SIMT] Add shfl_up_i32/f32 warp intrinsics (#4632) (by Yu Zhang)
  • [Doc] Examples directory update (#4640) (by dongqi shen)
  • [vulkan] Choose better devices (#4614) (by Bob Cao)
  • [SIMT] Implement ti.simt.warp.shfl_down_i32 and add stubs for other warp-level intrinsics (#4616) (by Yuanming Hu)
  • [refactor] Refactor Identifier::id_counter to global and local counter (#4581) (by PGZXB)
  • [android] Disable XDG on non-supported platform (#4612) (by Gabriel H)
  • [gui] [aot] Allow set_image to use user VBO (#4611) (by Gabriel H)
  • [Doc] Add docsting for Camera class in ui module (#4588) (by Zhao Liang)
  • [metal] Implement buffer_fill for unified device API (#4595) (by Ye Kuang)
  • [Lang] Matrix 3x3 eigen decomposition (#4571) (by Peng Yu)
  • [Doc] Set up the basis of Taichi specification (#4603) (by Yi Xu)
  • [gui] Make GGUI VBO configurable for particles (#4610) (by Yuheng Zou)
  • [Doc] Update with Python 3.10 support (#4609) (by Bo Qiao)
  • [misc] Bump version to v0.9.3 (#4608) (by Taichi Gardener)
  • [Lang] Deprecate ext_arr/any_arr in favor of types.ndarray (#4598) (by Yi Xu)
  • [Doc] Adjust CPU GUI document layout (#4605) (by Peng Yu)
  • [Doc] Refactored Type system. (#4584) (by Vissidarte-Herman)
  • [lang] Fix vector matrix ndarray to numpy layout (#4597) (by Bo Qiao)
  • [bug] Fix bug that caching kernels with same AST will fail (#4582) (by PGZXB)
taichi - v0.9.2

Published by github-actions[bot] over 2 years ago

Highlights:

  • CI/CD workflow
    • Generate manylinux2014-compatible wheels with CUDA backend in release workflow (#4550) (by Yi Xu)
  • Command line interface
    • Fix a few bugs in taichi gallery command (#4548) (by Zhao Liang)
  • Documentation
    • Fixed broken links. (#4563) (by Vissidarte-Herman)
    • Refactored README.md (#4549) (by Vissidarte-Herman)
    • Create CODE_OF_CONDUCT (#4564) (by notginger)
    • Update syntax.md (#4557) (by Vissidarte-Herman)
    • Update docstring for ndrange (#4486) (by Zhao Liang)
    • Minor updates: It is recommended to type hint arguments and return values (#4510) (by Vissidarte-Herman)
    • Refactored Kernels and functions. (#4496) (by Vissidarte-Herman)
    • Add initial variable and fragments (#4457) (by Justin)
  • Language and syntax
    • Add taichi gallery command for user to choose and run example in gui (#4532) (by TiGeekMan)
    • Add ti.serialize and ti.loop_config (#4525) (by Lin Jiang)
    • Support simple matrix slicing (#4488) (by Xiangyun Yang)
    • Remove legacy ways to construct matrices (#4521) (by Yi Xu)

Full changelog:

  • [lang] Replace keywords in python (#4606) (by Jiasheng Zhang)
  • [lang] Fix py36 block_dim bug (#4601) (by Jiasheng Zhang)
  • [ci] Fix release script bug (#4599) (by Jiasheng Zhang)
  • [aot] Support return in vulkan aot (#4593) (by Ailing)
  • [ci] Release script add test for tests/python/examples (#4590) (by Jiasheng Zhang)
  • [misc] Write version info right after creation of uuid (#4589) (by Jiasheng Zhang)
  • [gui] Make GGUI VBO configurable (#4575) (by Ye Kuang)
  • [test] Fix ill-formed test_binary_func_ret (#4587) (by Yi Xu)
  • Update differences_between_taichi_and_python_programs.md (#4583) (by Vissidarte-Herman)
  • [misc] Fix a few warnings (#4572) (by Ye Kuang)
  • [aot] Remove redundant module_path argument (#4573) (by Ailing)
  • [bug] [opt] Fix some bugs when deal with real function (#4568) (by Xiangyun Yang)
  • [build] Guard llvm usage inside TI_WITH_LLVM (#4570) (by Ailing)
  • [aot] [refactor] Add make_new_field for Metal (#4559) (by Bo Qiao)
  • [llvm] [lang] Add support for multiple return statements in real function (#4536) (by Lin Jiang)
  • [test] Add test for offline-cache (#4562) (by PGZXB)
  • Format updates (#4567) (by Vissidarte-Herman)
  • [aot] Add KernelTemplate interface (#4558) (by Ye Kuang)
  • [test] Eliminate the warnings in test suite (#4556) (by Frost Ming)
  • [Doc] Fixed broken links. (#4563) (by Vissidarte-Herman)
  • [Doc] Refactored README.md (#4549) (by Vissidarte-Herman)
  • [Doc] Create CODE_OF_CONDUCT (#4564) (by notginger)
  • [misc] Reset counters in Program::finalize() (#4561) (by PGZXB)
  • [misc] Add TI_CI env to CI/CD (#4551) (by Jiasheng Zhang)
  • [ir] Add basic tests for Block (#4553) (by Ye Kuang)
  • [refactor] Fix error message (#4552) (by Ye Kuang)
  • [Doc] Update syntax.md (#4557) (by Vissidarte-Herman)
  • [gui] Hack to make GUI.close() work on macOS (#4555) (by Ye Kuang)
  • [aot] Fix get_kernel API semantics (#4554) (by Ye Kuang)
  • [opt] Support offline-cache for kernel with arch=cpu (#4500) (by PGZXB)
  • [CLI] Fix a few bugs in taichi gallery command (#4548) (by Zhao Liang)
  • [ir] Small optimizations to codegen (#4442) (by Bob Cao)
  • [CI] Generate manylinux2014-compatible wheels with CUDA backend in release workflow (#4550) (by Yi Xu)
  • [misc] Metadata update (#4539) (by Jiasheng Zhang)
  • [test] Parametrize the test cases with pytest.mark (#4546) (by Frost Ming)
  • [Doc] Update docstring for ndrange (#4486) (by Zhao Liang)
  • [build] Default symbol visibility to hidden for all targets (#4545) (by Gabriel H)
  • [autodiff] Handle multiple, mixed Independent Blocks (IBs) within multi-levels serial for-loops (#4523) (by Mingrui Zhang)
  • [bug] [lang] Cast the arguments of real function to the desired types (#4538) (by Lin Jiang)
  • [Lang] Add taichi gallery command for user to choose and run example in gui (#4532) (by TiGeekMan)
  • [bug] Fix bug that calling std::getenv when cpp-tests running will fail (#4537) (by PGZXB)
  • [vulkan] Fix performance (#4535) (by Bob Cao)
  • [Lang] Add ti.serialize and ti.loop_config (#4525) (by Lin Jiang)
  • [Lang] Support simple matrix slicing (#4488) (by Xiangyun Yang)
  • Update vulkan_api.cpp (#4533) (by Bob Cao)
  • [lang] Quick fix for mesh_local analyzer (#4529) (by Chang Yu)
  • [test] Show arch info in the verbose test report (#4528) (by Frost Ming)
  • [aot] Add binding_id of root/gtmp/rets/args bufs to CompiledOffloadedTask (#4522) (by Ailing)
  • [vulkan] Relax a few test precisions for vulkan (#4524) (by Ailing)
  • [build] Option to use LLD (#4513) (by Bob Cao)
  • [misc] [linux] Implement XDG Base Directory support (#4514) (by ruro)
  • [Lang] [refactor] Remove legacy ways to construct matrices (#4521) (by Yi Xu)
  • [misc] Make result of irpass::print hold more information (#4517) (by PGZXB)
  • [refactor] Misc improvements over AST helper functions (#4398) (by daylily)
  • [misc] [build] Bump catch external library 2.13.3 -> 2.13.8 (#4516) (by ruro)
  • [autodiff] Reduce the number of ad stack using knowledge of derivative formulas (#4512) (by Mingrui Zhang)
  • [ir] [opt] Fix a bug about 'continue' stmt in cfg_build (#4507) (by Xiangyun Yang)
  • [Doc] Minor updates: It is recommended to type hint arguments and return values (#4510) (by Vissidarte-Herman)
  • [ci] Fix the taichi repo name by hardcode (#4506) (by Frost Ming)
  • [build] Guard dx lib search with TI_WITH_DX11 (#4505) (by Ailing)
  • [ci] Reduce the default device memory usage for GPU tests (#4508) (by Bo Qiao)
  • [Doc] Refactored Kernels and functions. (#4496) (by Vissidarte-Herman)
  • [aot] [refactor] Refactor AOT field API for Vulkan (#4490) (by Bo Qiao)
  • [ci] Fix: fill in the pull request body created by bot (#4503) (by Frost Ming)
  • [ci] Skip in steps rather than the whole job (#4499) (by Frost Ming)
  • [ci] Add a Dockerfile for building manylinux2014-compatible Taichi wheels with CUDA backend (#4491) (by Yi Xu)
  • [ci] Automate release publishing (#4428) (by Frost Ming)
  • [fix] dangling ti.func decorator in euler.py (#4492) (by Zihua Wu)
  • [ir] Fix a bug in simplify pass (#4489) (by Xiangyun Yang)
  • [test] Add test for recursive real function (#4477) (by Lin Jiang)
  • [Doc] Add initial variable and fragments (#4457) (by Justin)
  • [misc] Add a convenient script for testing compatibility of Taichi releases. (#4485) (by Chengchen(Rex) Wang)
  • [misc] Version bump: v0.9.1 -> v0.9.2 (#4484) (by Chengchen(Rex) Wang)
  • [ci] Update gpu docker image to test python 3.10 (#4472) (by Bo Qiao)
taichi - v0.9.1

Published by rexwangcc over 2 years ago

Highlights:

  • CI/CD workflow
    • Cleanup workspace before window test (#4405) (by Jian Zeng)
  • Documentation
    • Update docstrings for functions in ops (#4465) (by Zhao Liang)
    • Update docstring for functions in misc (#4474) (by Zhao Liang)
    • Update docstrings in misc (#4446) (by Zhao Liang)
    • Update docstring for functions in operations (#4427) (by Zhao Liang)
    • Update PyTorch interface documentation (#4311) (by Andrew Sun)
    • Update docstring for functions in operations (#4413) (by Zhao Liang)
    • Update docstring for functions in operations (#4392) (by Zhao Liang)
    • Fix broken links (#4368) (by Ye Kuang)
    • Re-structure the articles: getting-started, gui (#4360) (by Ye Kuang)
  • Error messages
    • Add error message when the number of elements in kernel arguments exceed (#4444) (by Xiangyun Yang)
    • Add error for invalid snode size (#4460) (by Lin Jiang)
    • Add error messages for wrong type annotations of literals (#4462) (by Yi Xu)
    • Remove the mentioning of ti.pyfunc in the error message (#4429) (by Lin Jiang)
  • Language and syntax
    • Support sparse matrix builder datatype configuration (#4411) (by Peng Yu)
    • Support type annotations for literals (#4440) (by Yi Xu)
    • Support simple matrix slicing (#4420) (by Xiangyun Yang)
    • Support kernel to return a matrix type value (#4062) (by Xiangyun Yang)
  • Vulkan backend
    • Enable Vulkan device selection when using cuda (#4330) (by Bo Qiao)

Full changelog:

  • [bug] [llvm] Initialize the field to 0 when finalizing a field (#4463) (by Lin Jiang)
  • [Doc] Update docstrings for functions in ops (#4465) (by Zhao Liang)
  • [Error] Add error message when the number of elements in kernel arguments exceed (#4444) (by Xiangyun Yang)
  • [Doc] Update docstring for functions in misc (#4474) (by Zhao Liang)
  • [metal] Support device memory allocation/deallocation (#4439) (by Ye Kuang)
  • update docstring for exceptions (#4475) (by Zhao Liang)
  • [llvm] Support real function with single scalar return value (#4452) (by Lin Jiang)
  • [refactor] Remove LLVM logic from the generic Device interface (#4470) (by PGZXB)
  • [lang] Add decorator ti.experimental.real_func (#4458) (by Lin Jiang)
  • [ci] Add python 3.10 into nightly test and release (#4467) (by Bo Qiao)
  • [bug] Fix metal linker error when TI_WITH_METAL=OFF (#4469) (by Bo Qiao)
  • [Lang] Support sparse matrix builder datatype configuration (#4411) (by Peng Yu)
  • [Error] Add error for invalid snode size (#4460) (by Lin Jiang)
  • [aot] [refactor] Refactor AOT runtime API to use module (#4437) (by Bo Qiao)
  • [misc] Optimize verison check (#4461) (by Jiasheng Zhang)
  • [Error] Add error messages for wrong type annotations of literals (#4462) (by Yi Xu)
  • [misc] Remove some warnings (#4453) (by PGZXB)
  • [refactor] Move literal construction to expr module (#4448) (by Yi Xu)
  • [bug] [lang] Enable break in the outermost for not in the outermost scope (#4447) (by Lin Jiang)
  • [Doc] Update docstrings in misc (#4446) (by Zhao Liang)
  • [llvm] Support real function which has scalar arguments (#4422) (by Lin Jiang)
  • [Lang] Support type annotations for literals (#4440) (by Yi Xu)
  • [misc] Remove a unnecessary function (#4443) (by PGZXB)
  • [metal] Expose BufferMemoryView (#4432) (by Ye Kuang)
  • [Lang] Support simple matrix slicing (#4420) (by Xiangyun Yang)
  • [Doc] Update docstring for functions in operations (#4427) (by Zhao Liang)
  • [metal] Add Unified Device API skeleton code (#4431) (by Ye Kuang)
  • [refactor] Refactor llvm-offloaded-task-name mangling (#4418) (by PGZXB)
  • [Doc] Update PyTorch interface documentation (#4311) (by Andrew Sun)
  • [misc] Add deserialization tool for benchmarks (#4278) (by rocket)
  • [misc] Add matrix operations to micro-benchmarks (#4190) (by rocket)
  • [Error] Remove the mentioning of ti.pyfunc in the error message (#4429) (by Lin Jiang)
  • [metal] Add AotModuleLoader (#4423) (by Ye Kuang)
  • [Doc] Update docstring for functions in operations (#4413) (by Zhao Liang)
  • [vulkan] Support templated kernel in aot module (#4417) (by Ailing)
  • [vulkan] [aot] Add aot namespace Vulkan (#4419) (by Bo Qiao)
  • [Lang] Support kernel to return a matrix type value (#4062) (by Xiangyun Yang)
  • [test] Add a test for the ad_gravity example (#4404) (by FZC)
  • [Doc] Update docstring for functions in operations (#4392) (by Zhao Liang)
  • [CI] Cleanup workspace before window test (#4405) (by Jian Zeng)
  • [build] Enforce compatibility with manylinux2014 when TI_WITH_VULKAN=OFF (#4406) (by Yi Xu)
  • [ci] Update tag to projects (#4400) (by Bo Qiao)
  • [ci] Reduce test parallelism for m1 (#4394) (by Bo Qiao)
  • [aot] [vulkan] Add AotKernel and its Vulkan impl (#4387) (by Ye Kuang)
  • [vulkan] [aot] Move add_root_buffer to public members (#4396) (by Gabriel H)
  • [llvm] Remove LLVM functions related to a SNode tree from the module when the SNode tree is destroyed (#4356) (by Lin Jiang)
  • [test] disable serveral workflows on forks (#4393) (by Jian Zeng)
  • [ci] Windows build exits on the first error (#4391) (by Bo Qiao)
  • [misc] Upgrade test and docker image to support python 3.10 (#3986) (by Bo Qiao)
  • [aot] [vulkan] Output shapes/dims to AOT exported module (#4382) (by Gabriel H)
  • [test] Merge the py38 only cases into the main test suite (#4378) (by Frost Ming)
  • [vulkan] Refactor Runtime to decouple the SNodeTree part (#4380) (by Ye Kuang)
  • [lang] External Ptr alias analysis & demote atomics (#4273) (by Bob Cao)
  • [example] Fix implicit_fem example command line arguments (#4372) (by bx2k)
  • [mesh] Constructing mesh from data in memory (#4375) (by bx2k)
  • [refactor] Move aot_module files (#4374) (by Ye Kuang)
  • [test] Add test for exposed top-level APIs (#4361) (by Yi Xu)
  • [refactor] Move arch files (#4373) (by Ye Kuang)
  • [build] Build with Apple clang-13 (#4370) (by Ailing)
  • [test] [example] Add a test for print_offset example (#4355) (by Zhi Qi)
  • [test] Add a test for the game_of_life example (#4365) (by 0xzhang)
  • [test] Add a test for the nbody example (#4366) (by 0xzhang)
  • [Doc] Fix broken links (#4368) (by Ye Kuang)
  • [ci] Run vulkan and metal separately on M1 (#4367) (by Ailing)
  • [Doc] Re-structure the articles: getting-started, gui (#4360) (by Ye Kuang)
  • [Vulkan] Enable Vulkan device selection when using cuda (#4330) (by Bo Qiao)
  • [misc] Version bump: v0.9.0->v0.9.1 (#4363) (by Ailing)
  • [dx11] Materialize runtime, map and unmap (#4339) (by quadpixels)