Productive, portable, and performant GPU programming in Python.
APACHE-2.0 License
Bot releases are visible (Hide)
Highlights:
Full changelog:
Published by github-actions[bot] 11 months ago
We are excited to announce the stabilization of the Real Function feature in Taichi Lang v1.7.0. Initially introduced as an experimental feature in v1.0.0, it has now matured with enhanced capabilities and usability.
@ti.real_func
. The previous decorator, @ti.experimental.real_func
, is deprecated.@ti.func
), are compiled as separate entities, akin to CUDA's device functions. This separation allows for recursive runtime calls and significantly faster compilation. For instance, the Cornell box example's compilation time is reduced from 2.34s
to 1.01s
on an i9-11900K when switching from inline to real functions.Important Note on Usage: Ensure all arguments and return values in Real Functions are explicitly type-hinted.
The following example demonstrates the recursive capability of Real Functions. The sum_func
Real Function is used to calculate the sum of numbers from 1 to n, showcasing its ability to handle multiple return statements and variable recursion depths.
@ti.real_func
def sum_func(n: ti.i32) -> ti.i32:
if n == 0:
return 0
return sum_func(n - 1) + n
@ti.kernel
def sum(n: ti.i32) -> ti.i32:
return sum_func(n)
print(sum(100)) # 5050
You can find more examples of the real function in the repository.
In this update, we've introduced the capability to return multiple values from a Taichi kernel. This can be achieved by specifying a tuple as the return type. You can directly use (ti.f32, s0)
as the type hint or write the type hint in Python manner like typing.Tuple[ti.f32, s0]
or for Python 3.9 and above, tuple[ti.f32, s0]
. The following example illustrates this new feature:
s0 = ti.types.struct(a=ti.math.vec3, b=ti.i16)
@ti.real_func
def foo() -> (ti.f32, s0):
return 1, s0(a=ti.math.vec3([100, 0.5, 3]), b=1)
@ti.kernel
def bar() -> (ti.f32, s0):
return foo()
ret1, ret2 = bar()
print(ret1) # 1.0
print(ret2) # {'a': [100.0, 0.5, 3.0], 'b': 1}
We have eliminated the size restrictions on kernel arguments and return values. However, it's crucial to remember that keeping these small is advisable. Large argument or return value sizes can lead to substantially longer compile times. While we support larger sizes, we haven't thoroughly tested arguments and return values exceeding 4KB and cannot guarantee their flawless functionality.
Taichi now introduces a powerful feature for developers - Argument Packs. This new functionality enables efficient caching of unchanged parameters between multiple kernel calls, which not only provides convenience when launching a kernel, but also boosts the performance.
import taichi as ti
ti.init()
# Defining a custom argument type using "ti.types.argpack"
view_params_tmpl = ti.types.argpack(view_mtx=ti.math.mat4, proj_mtx=ti.math.mat4, far=ti.f32)
# Declaration of a Taichi kernel leveraging Argument Packs
@ti.kernel
def p(view_params: view_params_tmpl) -> ti.f32:
return view_params.far
# Instantiation of the argument pack
view_params = view_params_tmpl(
view_mtx=ti.math.mat4(
[[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]),
proj_mtx=ti.math.mat4(
[[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]),
far=1)
# Executing the kernel with the Argument Pack
print(p(view_params)) # Outputs: 1.0
Argument Packs are currently compatible with a variety of data types, including scalar
, matrix
, vector
, Ndarray
, and Struct
.
Please note that Argument Packs currently do not support the following features and data types:
ti.template
ti.data_oriented
ti.init
.device_memory_GB
and device_memory_fraction
Usage:ti.pointer
. This preallocation occurs only once a Sparse data structure is detected in your code.diffmpm3d: 3866MB --> 3190 MB
nerf_train_deploy: 5618MB --> 4664 MB
Added the following ti.simt.block
APIs:
ti.simt.block.sync_any_nonzero
ti.simt.block.sync_all_nonzero
ti.simt.block.sync_count_nonzero
Added helper function to create a 2D/3D sparse grid, for example:
# create a 2D sparse grid
grid = ti.sparse.grid(
{
"pos": ti.math.vec2,
"mass": ti.f32,
"grid2particles": ti.types.vector(20, ti.i32),
},
shape=(10, 10),
)
# access
grid[0, 0].pos = ti.math.vec2(1, 2)
grid[0, 0].mass = 1.0
grid[0, 0].grid2particles[2] = 123
ti_import_cpu_memory()
and ti_import_cuda_memory()
ti.ad.Tape
taichi.math
APIsti.func
vec_to_euler
that breaks GGUI cameras & handle camera logic betterti.experimental.real_func
because it is no longer experimental. Please use ti.real_func
instead.Highlights:
- **Bug fixes**
- Fix macro error with ti_import_cpu_memory (#8401) (by **Zhanlue Yang**)
- Fix argpack nesting issues (by **listerily**)
- Convert matrices to structs in argpack type members, Fixing layout error (by **listerily**)
- Fix error when returning a struct field member when the return … (#8271) (by **秋云未云**)
- Fix Erroneous handling of ndarray in real function in CFG (#8245) (by **Lin Jiang**)
- Fix issue with passing python-scope Matrix as ti.func argument (#8197) (by **Zhanlue Yang**)
- Fix incorrect CFG Graph structure due to missing Block wiith OffloadedStmts on LLVM backend (#8113) (by **Zhanlue Yang**)
- Fix type inference error with LowerMatrixPtr pass (#8105) (by **Zhanlue Yang**)
- Set initial value for Cuda device allocation (#8063) (by **Zhanlue Yang**)
- Fix the insertion position of the access chain (#7957) (by **Lin Jiang**)
- Fix wrong datatype size when writing to ndarray from Python scope (by **Ailing Zhang**)
- **CUDA backend**
- Warn driver version if it doesn't support memory pool. (#7912) (by **Haidong Lan**)
- **Documentation**
- Fixing typo in impl.py on ti.grouped function documentation (#8407) (by **Quentin Warnant**)
- Update doc about kernels and functions (#8400) (by **Lin Jiang**)
- Update documentation (#8089) (by **Zhao Liang**)
- Update docstring for inverse func (#8170) (by **Zhao Liang**)
- Update type.md, add descriptions of the vector (#8048) (by **Chenzhan Shang**)
- Fix a bug in faq.md (#7992) (by **Zhao Liang**)
- Fix problems in type_system.md (#7949) (by **秋云未云**)
- Add doc about struct arguments (#7959) (by **Lin Jiang**)
- Fix docstring of mix function (#7922) (by **Zhao Liang**)
- Update faq and ggui, and add them to CI (#7861) (by **Zhao Liang**)
- Add kernel sync doc (#7831) (by **Zhao Liang**)
- **Error messages**
- Warn before calling the external function (#8177) (by **Lin Jiang**)
- Add option to print full traceback in Python (#8160) (by **Lin Jiang**)
- Let to_primitive_type throw an error if the type is a pointer (by **lin-hitonami**)
- Update deprecation warning of the graph arguments (#7965) (by **Lin Jiang**)
- **Language and syntax**
- Add clz instruction (#8276) (by **Jett Chen**)
- Move real function out of the experimental module (#8399) (by **Lin Jiang**)
- Fix error with loop unique analysis for MatrixPtrStmt (#8307) (by **Zhanlue Yang**)
- Pass DebugInfo from Python to C++ for ndarray and field (#8286) (by **魔法少女赵志辉**)
- Support TensorType for SharedArray (#8258) (by **Zhanlue Yang**)
- Use ErrorEmitter in type check passes (#8285) (by **魔法少女赵志辉**)
- Implement struct DebugInfo and ErrorEmitter (#8284) (by **魔法少女赵志辉**)
- Add TensorType support for Constant Folding (#8250) (by **Zhanlue Yang**)
- Support TensorType for irpass::alg_simp() (#8225) (by **Zhanlue Yang**)
- Support vector/matrix ndarray arguments in real function (by **Lin Jiang**)
- Fix error on ndarray type check (by **Lin Jiang**)
- Support real function in data-oriented classes (by **lin-hitonami**)
- Let kernel support return type annotated with 'typing.Tuple' (by **lin-hitonami**)
- Support tuple return value for kernel and real function (by **lin-hitonami**)
- Let static assert be in static scope (#8217) (by **Lin Jiang**)
- Avoid scalarization for AOS GlobalPtrStmt (#8187) (by **Zhanlue Yang**)
- Support matrix return value for real function (by **lin-hitonami**)
- Support ndarray argument for real function (by **lin-hitonami**)
- Cast the scalar arguments and return values of ti.func if the type hints exist (#8193) (by **Lin Jiang**)
- Handle MatrixPtrStmt for uniquely_accessed_pointers() (#8165) (by **Zhanlue Yang**)
- Support struct arguments for real function (by **lin-hitonami**)
- Merge irpass::half2_vectorize() with irpass::scalarize() (#8102) (by **Zhanlue Yang**)
- Migrate irpass::scalarize() after optimize_bit_struct_stores & determine_ad_stack_size (#8097) (by **Zhanlue Yang**)
- Migrate irpass::scalarize() after irpass::demote_operations() (#8096) (by **Zhanlue Yang**)
- Migrate irpass::scalarize() after irpass::lower_access() (#8091) (by **Zhanlue Yang**)
- Migrate irpass::scalarize() after irpass::make_block_local() (#8090) (by **Zhanlue Yang**)
- Support TensorType for Dead-Store-Elimination (#8065) (by **Zhanlue Yang**)
- Optimize alias checking conditions for store-to-load forwarding (#8079) (by **Zhanlue Yang**)
- Support TensorType for Load-Store-Forwarding (#8058) (by **Zhanlue Yang**)
- Fix TensorTyped error with irpass::make_thread_local() (#8051) (by **Zhanlue Yang**)
- Fix numerical issue with auto_diff() (#8025) (by **Zhanlue Yang**)
- Migrate irpass::scalarize() after irpass::make_mesh_block_local() (#8030) (by **Zhanlue Yang**)
- Migrate irpass::scalarize() after irpass::make_thread_local() (#8028) (by **Zhanlue Yang**)
- Support allocate with cuda memory pool and reduce preallocation size accordingly (#7929) (by **Zhanlue Yang**)
- Migrate irpass::scalarize() after irpass::demote_no_access_mesh_fors() (#7956) (by **Zhanlue Yang**)
- Fix error with irpass::check_out_of_bound() for TensorTyped ExternalPtrStmt (#7997) (by **Zhanlue Yang**)
- Migrate irpass::scalarize() after irpass::demote_atomics() (#7943) (by **Zhanlue Yang**)
- Separate out preallocation logics for runtime objects (#7938) (by **Zhanlue Yang**)
- Remove deprecated funcs in __init__.py (#7941) (by **Lin Jiang**)
- Remove deprecated sparse_matrix_builder function (#7942) (by **Lin Jiang**)
- Remove deprecated compile option ndarray_use_cached_allocator (#7937) (by **Zhanlue Yang**)
- Migrate irpass::scalarize() after irpass::detect_read_only() (#7939) (by **Zhanlue Yang**)
- Remove deprecated funcs in ti.ui (#7940) (by **Lin Jiang**)
- Remove the support for 'is' (#7930) (by **Lin Jiang**)
- Migrate irpass::scalarize() after irpass::offload() (#7919) (by **Zhanlue Yang**)
- Raise error when the dimension of the ndrange does not equal to the number of the loop variable (#7933) (by **Lin Jiang**)
- Remove a.atomic(b) (#7925) (by **Lin Jiang**)
- Cancel deprecating native min/max (#7928) (by **Lin Jiang**)
- Fix the api doc search problem (#7918) (by **Zhao Liang**)
- Move irpass::scalarize() after irpass::auto_diff() (#7902) (by **Zhanlue Yang**)
- Fix Ndarray fill with Matrix/Vector typed values (#7901) (by **Zhanlue Yang**)
- Add cast to field.fill() interface (#7899) (by **Zhanlue Yang**)
- Let nested data classes have methods (#7909) (by **Lin Jiang**)
- Let kernel argument support matrix nested in a struct (by **lin-hitonami**)
- Support the functions of dataclass as kernel argument and return value (#7865) (by **Lin Jiang**)
- Fix a bug on PosixPath (#7860) (by **Zhao Liang**)
- Postpone MatrixType scalarization to irpass::differentiation_validation_check() (#7839) (by **Zhanlue Yang**)
- Postpone MatrixType scalarization to irpass::gather_meshfor_relation_types() (#7838) (by **Zhanlue Yang**)
- **Miscellaneous**
- Make clang-tidy happy on 'explicit' (#7999) (by **秋云未云**)
- **OpenGL backend**
- Fix: runtime caught error cannot be displayed in opengl (#7998) (by **秋云未云**)
- **IR optimization passes**
- Make merging casts int(int(x)) less aggressive (#7944) (by **Ailing**)
- Fix redundant clone of stmts across offloaded tasks (#7927) (by **Ailing**)
- **Refactor**
- Refactor the argument passing logic of rwtexture and remove extra_args (#7914) (by **Lin Jiang**)
Published by github-actions[bot] over 1 year ago
Removed API | Replace with |
---|---|
Using atomic operations like a.atomic_add(b) | ti.atomic_add(a, b) or a += b |
Using is and is not inside Taichi kernel and Taichi function | Not supported |
Ndrange for loop with the number of the loop variables not equal to the dimension of the ndrange | Not supported |
ti.ui.make_camera() | ti.ui.Camera() |
ti.ui.Window.write_image() | ti.ui.Window.save_image() |
ti.SOA | ti.Layout.SOA |
ti.AOS | ti.Layout.AOS |
ti.print_profile_info | ti.profiler.print_scoped_profiler_info |
ti.clear_profile_info | ti.profiler.clear_scoped_profiler_info |
ti.print_memory_profile_info | ti.profiler.print_memory_profiler_info |
ti.CuptiMetric | ti.profiler.CuptiMetric |
ti.get_predefined_cupti_metrics | ti.profiler.get_predefined_cupti_metrics |
ti.print_kernel_profile_info | ti.profiler.print_kernel_profiler_info |
ti.query_kernel_profile_info | ti.profiler.query_kernel_profiler_info |
ti.clear_kernel_profile_info | ti.profiler.clear_kernel_profiler_info |
ti.kernel_profiler_total_time | ti.profiler.get_kernel_profiler_total_time |
ti.set_kernel_profiler_toolkit | ti.profiler.set_kernel_profiler_toolkit |
ti.set_kernel_profile_metrics | ti.profiler.set_kernel_profiler_metrics |
ti.collect_kernel_profile_metrics | ti.profiler.collect_kernel_profiler_metrics |
ti.VideoManager | ti.tools.VideoManager |
ti.PLYWriter | ti.tools.PLYWriter |
ti.imread | ti.tools.imread |
ti.imresize | ti.tools.imresize |
ti.imshow | ti.tools.imshow |
ti.imwrite | ti.tools.imwrite |
ti.ext_arr | ti.types.ndarray |
ti.any_arr | ti.types.ndarray |
ti.Tape | ti.ad.Tape |
ti.clear_all_gradients | ti.ad.clear_all_gradients |
ti.linalg.sparse_matrix_builder | ti.types.sparse_matrix_builder |
element_shape
argument for scalar and ndarrayshape
, channel_format
and num_channels
arguments for texturecc
backend will be removed at next release (v1.7.0
)You can now use struct arguments in all backends. The structs can be nested, and it can contain matrices and vectors. Here's an example:
transform_type = ti.types.struct(R=ti.math.mat3, T=ti.math.vec3)
pos_type = ti.types.struct(x=ti.math.vec3, trans=transform_type)
@ti.kernel
def kernel_with_nested_struct_arg(p: pos_type) -> ti.math.vec3:
return p.trans.R @ p.x + p.trans.T
trans = transform_type(ti.math.mat3(1), [1, 1, 1])
p = pos_type(x=[1, 1, 1], trans=trans)
print(kernel_with_nested_struct_arg(p)) # [4., 4., 4.]
ti.frexp
is supported on CUDA, Vulkan, Metal, OpenGL backends.ti.math.popcnt
intrinsic by Garry Ling
yapf
to black
Highlights:
Full changelog:
yapf
to black
(#7785) (by Proton)Published by github-actions[bot] over 1 year ago
We now support returning a struct on LLVM-based backends (CPU and CUDA backend). The struct can contain vectors and matrices, and it can also nest with other structs. Here's an example.
s0 = ti.types.struct(a=ti.math.vec3, b=ti.i16)
s1 = ti.types.struct(a=ti.f32, b=s0)
@ti.kernel
def foo() -> s1:
return s1(a=1, b=s0(a=ti.math.vec3(100, 0.2, 3), b=1))
print(foo()) # {'a': 1.0, 'b': {'a': [100.0, 0.2, 3.0], 'b': 1}}
Highlights:
- **AMDGPU backend**
- Enable shared array on amdgpu backend (#7403) (by **Zeyu Li**)
- Add print kernel amdgcn (#7357) (by **Zeyu Li**)
- Add amdgpu backend profiler (#7330) (by **Zeyu Li**)
- **Aot module**
- Let AOT kernel inherit CallableBase and use LaunchContextBuilder (by **lin-hitonami**)
- Deprecate element shape and field dim for AOT symbolic args (#7100) (by **Haidong Lan**)
- **Bug fixes**
- Fix copy_from() of StructField (#7294) (by **Yi Xu**)
- Fix caching same loop invariant global vars inside nested fors (#7285) (by **Lin Jiang**)
- Fix num_splits in parallel_struct_for (#7121) (by **Yi Xu**)
- Fix ret_type and cast_type of UnaryOpStmt in Scalarize (#7082) (by **Yi Xu**)
- **Documentation**
- Update GGUI docs with correct API (#7525) (by **pengyu**)
- Fix typos and improve example code in data_oriented_class.md (#7520) (by **pengyu**)
- Update gui_system.md, remove unnecessary example (#7487) (by **NextoneX**)
- Fix typo in API doc (#7511) (by **pengyu**)
- Update math_module (#7405) (by **Zhao Liang**)
- Update hello_world.md (#7400) (by **Zhao Liang**)
- Update debugging.md (#7401) (by **Zhao Liang**)
- Update hello_world.md (#7380) (by **Zhao Liang**)
- Update type.md (#7376) (by **Zhao Liang**)
- Update kernel_function.md (#7375) (by **Zhao Liang**)
- Update hello_world.md (#7369) (by **Zhao Liang**)
- Update hello_world.md (#7368) (by **Zhao Liang**)
- Update data_oriented_class.md (#6790) (by **Zhao Liang**)
- Update hello_world.md (#7367) (by **Zhao Liang**)
- Update kernel_function.md (#7364) (by **Zhao Liang**)
- Update hello_world.md (#7354) (by **Zhao Liang**)
- Update llvm_sparse_runtime.md (#7323) (by **Gabriel Vainer**)
- Update profiler.md (#7358) (by **Zhao Liang**)
- Update kernel_function.md (#7356) (by **Zhao Liang**)
- Update tut.md (#7352) (by **Gabriel Vainer**)
- Update type.md (#7350) (by **Zhao Liang**)
- Update hello_world.md (#7337) (by **Zhao Liang**)
- Update append docstring (#7265) (by **Zhao Liang**)
- Update ndarray.md (#7236) (by **Gabriel Vainer**)
- Update llvm_sparse_runtime.md (#7215) (by **Zhao Liang**)
- Remove doc tutorial (#7198) (by **Olinaaaloompa**)
- Rename tutorial doc (#7186) (by **Zhao Liang**)
- Update tutorial.md (#7176) (by **Zhao Liang**)
- Update math_module.md (#7175) (by **Zhao Liang**)
- Update debugging.md (#7173) (by **Zhao Liang**)
- Fix C++ tutorial does not display on doc site (#7174) (by **Zhao Liang**)
- Update doc regarding dynamic index (#7148) (by **Yi Xu**)
- Move glossary to top level (#7118) (by **Zhao Liang**)
- Update type.md (#7038) (by **Zhao Liang**)
- Fix docstring (#7065) (by **Zhao Liang**)
- **Error messages**
- Allow IfExp on matrices when the condition is scalar (#7241) (by **Lin Jiang**)
- Remove deprecations in ti.ui in 1.6.0 (#7229) (by **Lin Jiang**)
- Remove deprecated ti.linalg.sparse_matrix_builder in 1.6.0 (#7228) (by **Lin Jiang**)
- Remove deprecations in ASTTransformer in 1.6.0 (#7226) (by **Lin Jiang**)
- Remove deprecated a.atomic_op(b) in Taichi v1.6.0 (#7225) (by **Lin Jiang**)
- Remove deprecations in taichi/__init__.py in v1.6.0 (#7222) (by **Lin Jiang**)
- Raise error when using deprecated ifexp on matrices (#7224) (by **Lin Jiang**)
- Better error message when creating sparse snodes on backends that do not support sparse (#7191) (by **Lin Jiang**)
- Raise errors when using metal sparse (#7113) (by **Lin Jiang**)
- **GUI**
- GGUI use shader "factory" (GGUI rework n/N) (#7271) (by **Bob Cao**)
- **Intermediate representation**
- Unified type system for internal operations (#6337) (by **daylily**)
- **Language and syntax**
- Keep ti.pyfunc (#7530) (by **Lin Jiang**)
- Type check assignments between tensors (#7480) (by **Yi Xu**)
- Fix pylance warnings raised by ti.static (#7437) (by **Zhao Liang**)
- Deprecate arithmetic operations and fill() on ti.Struct (#7456) (by **Yi Xu**)
- Fix pylance warnnings by ti.random (#7439) (by **Zhao Liang**)
- Fix pylance types warning (#7417) (by **Zhao Liang**)
- Add better error message for dynamic snode (#7238) (by **Zhao Liang**)
- Simplify the swizzle generator (#7216) (by **Zhao Liang**)
- Remove the deprecated dynamic_index switch (#7195) (by **Yi Xu**)
- Remove deprecated packed switch (#7104) (by **Yi Xu**)
- Raise errors when using the packed switch (#7125) (by **Yi Xu**)
- Fix cannot use taichi in REPL (#7114) (by **Zhao Liang**)
- Remove deprecated ti.Matrix.rotation2d() (#7098) (by **Yi Xu**)
- Remove filename kwarg in aot Module save() (#7085) (by **Ailing**)
- Remove sourceinspect deprecation warning message (#7081) (by **Zhao Liang**)
- Make slicing a single row/column of a matrix return a vector (#7068) (by **Yi Xu**)
- **Miscellaneous**
- Strictly check ndim with external array (#7126) (by **Haidong Lan**)
Full changelog:
- [cc] Add deprecation notice for cc backend (#7651) (by **Ailing**)
- [misc] Cherry pick struct return related commits (#7575) (by **Haidong Lan**)
- [Lang] Keep ti.pyfunc (#7530) (by **Lin Jiang**)
- [bug] Fix symbol conflicts with taichi_cpp_tests (#7528) (by **Zhanlue Yang**)
- [bug] Fix numerical issue with TensorType'd arithmetics (#7526) (by **Zhanlue Yang**)
- [aot] Enable Metal AOT test (#7461) (by **PENGUINLIONG**)
- [Doc] Update GGUI docs with correct API (#7525) (by **pengyu**)
- [misc] Implement KernelCompialtionManager::clean_offline_cache (#7515) (by **PGZXB**)
- [ir] Except shared array from demote atomics pass. (#7513) (by **Haidong Lan**)
- [bug] Fix error with windows-clang compilation for cuda_runtime.cu (#7519) (by **Zhanlue Yang**)
- [misc] Deprecate field dim and update deprecation warnings (#7491) (by **Haidong Lan**)
- [build] Fix build failure without nvcc (#7521) (by **Ailing**)
- [Doc] Fix typos and improve example code in data_oriented_class.md (#7520) (by **pengyu**)
- [aot] Kernel argument count limit (#7518) (by **PENGUINLIONG**)
- [Doc] Update gui_system.md, remove unnecessary example (#7487) (by **NextoneX**)
- [AOT] [llvm] Let AOT kernel inherit CallableBase and use LaunchContextBuilder (by **lin-hitonami**)
- [llvm] Let the offline cache record the type info of arguments and return values (by **lin-hitonami**)
- [ir] Separate LaunchContextBuilder from Kernel (by **lin-hitonami**)
- [Doc] Fix typo in API doc (#7511) (by **pengyu**)
- [aot] Build Runtime C-API by default (#7508) (by **PENGUINLIONG**)
- [bug] Fix run_tests.py --with-offline-cache (#7507) (by **PGZXB**)
- [vulkan] Support printing constant strings containing % (#7499) (by **魔法少女赵志辉**)
- [ci] Fix nightly version number, 2nd try (#7501) (by **Proton**)
- [aot] Fixed memory leak in metal backend (#7500) (by **PENGUINLIONG**)
- [ci] Fix nightly version number issue (#7498) (by **Proton**)
- [example] Remove cv2, cairo dependency (#7496) (by **Zhao Liang**)
- [type] Let Type * be serializable (by **lin-hitonami**)
- [ci] Second attempt at permission check for ghstack landing (#7490) (by **Proton**)
- [docs] Reword words of warning about building from source (#7488) (by **Anselm Schüler**)
- [lang] Fixed double release of Metal command buffer (#7484) (by **PENGUINLIONG**)
- [ci] Switch Android bots lock redis to bot-master (#7482) (by **Proton**)
- [ci] Status check of ghstack CI bot (#7479) (by **Proton**)
- [Lang] Type check assignments between tensors (#7480) (by **Yi Xu**)
- [doc] Fix typo in ndarray.md (#7476) (by **Chenzhan Shang**)
- [opt] Enable half2 optimization for atomic_add operations on CUDA backend (#7465) (by **Zhanlue Yang**)
- [Lang] Fix pylance warnings raised by ti.static (#7437) (by **Zhao Liang**)
- Let the LaunchContextBuilder manage the result buffer (by **lin-hitonami**)
- [ci] Fix nightly build failure, and minor improvements (#7475) (by **Proton**)
- [ci] Fix duplicated names in aot tests (#7471) (by **Ailing**)
- [lang] Improve float16 support from Taichi type system (#7402) (by **Zhanlue Yang**)
- [Lang] Deprecate arithmetic operations and fill() on ti.Struct (#7456) (by **Yi Xu**)
- [misc] Add out of bound check for ndarray (#7458) (by **Ailing**)
- [aot] Remove graph kernel interfaces (#7466) (by **PENGUINLIONG**)
- [llvm] Let the RuntimeContext use the host result buffer (by **lin-hitonami**)
- [gui] Fix 3d line drawing & add test (#7454) (by **Bob Cao**)
- [lang] Fixed texture assertions (#7450) (by **PENGUINLIONG**)
- [aot] Fixed header generator (#7455) (by **PENGUINLIONG**)
- [aot] AOT module convention GfxRuntime140 (#7440) (by **PENGUINLIONG**)
- [misc] Add an explicit error in cc backend codegen for dynamic indexing (#7449) (by **Ailing**)
- [ci] Lower C++ tests concurrency (#7451) (by **Proton**)
- [aot] Properly handle texture attributes (#7433) (by **PENGUINLIONG**)
- [Lang] Fix pylance warnnings by ti.random (#7439) (by **Zhao Liang**)
- [ir] Get the StructType of the kernel parameters (by **lin-hitonami**)
- [ci] Report failure (not throwing exception) when C++ tests fail (#7435) (by **Proton**)
- [llvm] Allocate the result buffer from preallocated memory (by **lin-hitonami**)
- [vulkan] Fix GGUI and vulkan swapchain on AMD drivers (#7382) (by **Bob Cao**)
- [autodiff] Handle return statement (#7389) (by **Mingrui Zhang**)
- [misc] Remove unnecessary functions of gfx::AotModuleBuilderImpl (#7425) (by **PGZXB**)
- [bug] Fix offline_cache::clean_offline_cache_files (ti cache clean) (#7426) (by **PGZXB**)
- [test] Refactor C++ tests runner (#7421) (by **Proton**)
- [ci] Adjust perfmon GPU freq (#7429) (by **Proton**)
- [misc] Remove AotModuleParams::enable_lazy_loading (#7424) (by **PGZXB**)
- [aot] Use graphs.json instead of TCB (#7392) (by **PENGUINLIONG**)
- [refactor] Introduce KernelCompilationManager (#7409) (by **PGZXB**)
- [IR] Unified type system for internal operations (#6337) (by **daylily**)
- [lang] Add is_lvalue() to Expr to check writeback_binary operand (#7414) (by **魔法少女赵志辉**)
- [bug] Fix get_error_string ret type typo (#7418) (by **Zeyu Li**)
- [aot] Reorganize graph argument creation process (#7412) (by **PENGUINLIONG**)
- [Amdgpu] Enable shared array on amdgpu backend (#7403) (by **Zeyu Li**)
- [Lang] Fix pylance types warning (#7417) (by **Zhao Liang**)
- [aot] Simplify device capability assignment (#7407) (by **PENGUINLIONG**)
- [Doc] Update math_module (#7405) (by **Zhao Liang**)
- [ci] Lock GPU frequency in perf benchmarking (#7413) (by **Proton**)
- [ci] Add 'Needed single revision' workaround to all tasks (#7408) (by **Proton**)
- [Doc] Update hello_world.md (#7400) (by **Zhao Liang**)
- [refactor] Introduce KernelCompiler and implement spirv::KernelCompiler (#7371) (by **PGZXB**)
- [Amdgpu] Add print kernel amdgcn (#7357) (by **Zeyu Li**)
- [Doc] Update debugging.md (#7401) (by **Zhao Liang**)
- [refactor] Disable ASTSerializer::allow_undefined_visitor (#7391) (by **PGZXB**)
- [amdgpu] Enable llvm FpOpFusion option on AMDGPU backend (#7398) (by **Zeyu Li**)
- [aot] Add test for shared array (#7387) (by **Ailing**)
- [vulkan] Change command list submit error message & misc device API cleanups (#7395) (by **Bob Cao**)
- [bug] Fix arch_uses_spirv (#7399) (by **PGZXB**)
- [gui] Fix ggui & vulkan swapchain sizes on HiDPI displays (#7394) (by **Bob Cao**)
- [Doc] Update hello_world.md (#7380) (by **Zhao Liang**)
- [aot] Remove support for depth24stencil8 format on Metal (#7377) (by **PENGUINLIONG**)
- [bug] Add DeviceCapabilityConfig to offline cache key (#7384) (by **PGZXB**)
- [Doc] Update type.md (#7376) (by **Zhao Liang**)
- [refactor] Remove dependencies on Callable::program in cpp tests (#7373) (by **PGZXB**)
- [lang] Experimental support of conjugate gradient solver (#7035) (by **pengyu**)
- [aot] Metal interop APIs (#7366) (by **PENGUINLIONG**)
- [Doc] Update kernel_function.md (#7375) (by **Zhao Liang**)
- [gui] Add `fps_limit` for GGUI (#7374) (by **Bob Cao**)
- [Doc] Update hello_world.md (#7369) (by **Zhao Liang**)
- [aot] Fix blockers in static library build with XCode (#7365) (by **PENGUINLIONG**)
- [vulkan] Remove GLFW from Vulkan rhi dependency (#7351) (by **Bob Cao**)
- [misc] Remove useless semicolon in llvm_program.h (#7372) (by **PGZXB**)
- [Doc] Update hello_world.md (#7368) (by **Zhao Liang**)
- [Amdgpu] Add amdgpu backend profiler (#7330) (by **Zeyu Li**)
- [lang] Stop broadcasting scalar cond in select statements (#7344) (by **魔法少女赵志辉**)
- [bug] Fix validation erros due to inactive VK_KHR_16bit_storage (#7360) (by **Zhanlue Yang**)
- [aot] Support texture in Metal (#7363) (by **PENGUINLIONG**)
- [Doc] Update data_oriented_class.md (#6790) (by **Zhao Liang**)
- [Doc] Update hello_world.md (#7367) (by **Zhao Liang**)
- [refactor] Introduce lang::CompiledKernelData (#7340) (by **PGZXB**)
- [bug] Fix matrix initialization error with numpy.floating data (#7362) (by **Zhanlue Yang**)
- [Doc] Update kernel_function.md (#7364) (by **Zhao Liang**)
- [test] [amdgpu] Fix bug with allocs bb in function body (#7308) (by **Zeyu Li**)
- [Doc] Update hello_world.md (#7354) (by **Zhao Liang**)
- [aot] Fixed C-API docs (#7361) (by **PENGUINLIONG**)
- [refactor] Remove dependencies on Callable::program in lang::CompiledGraph::run (#7288) (by **PGZXB**)
- [DOC] Update llvm_sparse_runtime.md (#7323) (by **Gabriel Vainer**)
- [Doc] Update profiler.md (#7358) (by **Zhao Liang**)
- [Doc] Update kernel_function.md (#7356) (by **Zhao Liang**)
- [aot] Improve Taichi C++ wrapper implementation (#7347) (by **PENGUINLIONG**)
- [Doc] Update tut.md (#7352) (by **Gabriel Vainer**)
- [ci] Add doc snippet CI requirements (#7355) (by **Proton**)
- [amdgpu] Update device memory free (#7346) (by **Zeyu Li**)
- [Doc] Update type.md (#7350) (by **Zhao Liang**)
- [aot] Enable 16-bit dtype support for Taichi AOT (#7315) (by **Zhanlue Yang**)
- [example] Re-implement the Cornell Box demo with shorter lines of code (#7252) (by **HK-SHAO**)
- [aot] AOT CI refactorization (#7339) (by **PENGUINLIONG**)
- [llvm] Let the kernel return struct (by **lin-hitonami**)
- [Doc] Update hello_world.md (#7337) (by **Zhao Liang**)
- [ci] Reduce doc test concurrency (#7336) (by **Proton**)
- [ir] Refactor result fetching (by **lin-hitonami**)
- [ir] Get the offsets of elements in StructType (by **lin-hitonami**)
- [misc] Delete test.py (#7332) (by **Bob Cao**)
- [vulkan] More subgroup operations (#7328) (by **Bob Cao**)
- [vulkan] Add vulkan profiler (#7295) (by **Haidong Lan**)
- [refactor] Move TaichiLLVMContext::runtime_jit_module and TaichiLLVMContext::create_jit_module() to LlvmRuntimeExecutor (#7320) (by **PGZXB**)
- [refactor] Remove dependencies on LlvmProgramImpl::get_llvm_context() in TaskCodeGenLLVM (#7321) (by **PGZXB**)
- [ci] Checkout with privileged token when landing ghstack PRs (#7331) (by **Proton**)
- [ir] Add fields to StructType (by **lin-hitonami**)
- [gui] Remove renderable reuse & make renderable immediate (#7327) (by **Bob Cao**)
- [Gui] GGUI use shader "factory" (GGUI rework n/N) (#7271) (by **Bob Cao**)
- [bug] Fix u64 field cannot be assigned value >= 2 ** 63 (#7319) (by **Lin Jiang**)
- [type] Let the compute type of quant uint be unsigned int (by **lin-hitonami**)
- [doc] Replace slack with discord (#7318) (by **yanqingzhang**)
- [refactor] Change print statement to warnings.warn in taichi.lang.util.warning (#7301) (by **Jett Chen**)
- [ci] ChatOps: ghstack land (#7314) (by **Proton**)
- [refactor] Remove TaichiLLVMContext::lookup_function_pointer() (#7312) (by **PGZXB**)
- [misc] Update MSVC flags (#7254) (by **Bob Cao**)
- [doc] [ci] Cover code snippets in docs (#7309) (by **Proton**)
- [refactor] Remove dependencies on LlvmProgramImpl::get_llvm_context() in KernelCodeGen (#7289) (by **PGZXB**)
- [rhi] Device upload readback functions (#7278) (by **Bob Cao**)
- [aot] Fixed external project inclusion (#7297) (by **PENGUINLIONG**)
- [Doc] Update append docstring (#7265) (by **Zhao Liang**)
- [refactor] Remove dependencies on Callable::program in lang::get_hashed_offline_cache_key (#7287) (by **PGZXB**)
- [ci] [amdgpu] Enable amdgpu backend python unit tests (#7293) (by **Zeyu Li**)
- [Bug] Fix copy_from() of StructField (#7294) (by **Yi Xu**)
- [ci] Adapt new Android phone behavior (#7306) (by **Proton**)
- [Bug] Fix caching same loop invariant global vars inside nested fors (#7285) (by **Lin Jiang**)
- [amdgpu] Part5 enable the api of amdgpu (#7202) (by **Zeyu Li**)
- [amdgpu] Enable struct for on amdgpu backend (#7247) (by **Zeyu Li**)
- [misc] Update external/asset which was accidentally downgraded in #7248 (#7284) (by **Lin Jiang**)
- [amdgpu] Update runtime module (#7248) (by **Zeyu Li**)
- [llvm] Remove unused argument 'arch' in LlvmProgramImpl::get_llvm_context (#7282) (by **Lin Jiang**)
- [misc] Remove deprecated kwarg in rw_texture type annotations (#7267) (by **Ailing**)
- [ci] Tolerate duplicates when registering version (#7281) (by **Proton**)
- [gui] Fix GGUI destruction order (#7279) (by **Bob Cao**)
- [doc] Rename /doc/ndarray_android to /doc/tutorial (#7273) (by **Lin Jiang**)
- [llvm] Unify the llvm context of host and device (#7249) (by **Lin Jiang**)
- [misc] Fix manylinux2014 warning not printing (#7270) (by **Proton**)
- [ci] Building: add complete PATH set for conda (#7268) (by **Proton**)
- [autodiff] Support rsqrt operator (#7259) (by **Mingrui Zhang**)
- [ci] Update pre-commit repos version (#7257) (by **Proton**)
- [refactor] Fix "const CompileConfig *" to "const CompileConfig &" (Part2) (#7253) (by **PGZXB**)
- [refactor] Fix "const CompileConfig *" to "const CompileConfig &" (#7243) (by **PGZXB**)
- [aot] Added third-party render thread task injection for Unity (#7151) (by **PENGUINLIONG**)
- [aot] Support statically linked C-API library on MacOS (#7207) (by **Zhanlue Yang**)
- [gui] Force GGUI to go through host memory (nuking interops) (#7218) (by **Bob Cao**)
- [Error] Allow IfExp on matrices when the condition is scalar (#7241) (by **Lin Jiang**)
- [bug] Fix the parity of the RNG (#7239) (by **Lin Jiang**)
- [Lang] Add better error message for dynamic snode (#7238) (by **Zhao Liang**)
- [DOC] Update ndarray.md (#7236) (by **Gabriel Vainer**)
- [Error] Remove deprecations in ti.ui in 1.6.0 (#7229) (by **Lin Jiang**)
- [Doc] Update llvm_sparse_runtime.md (#7215) (by **Zhao Liang**)
- [lang] Add validation checks for subscripts to reject negative indices (#7212) (by **Zhanlue Yang**)
- [refactor] Remove legacy num_bits and acc_offsets from AxisExtractor (#7227) (by **Yi Xu**)
- [Error] Remove deprecated ti.linalg.sparse_matrix_builder in 1.6.0 (#7228) (by **Lin Jiang**)
- [Error] Remove deprecations in ASTTransformer in 1.6.0 (#7226) (by **Lin Jiang**)
- [misc] Export DeviceAllocation into Python & support devalloc in field_info (#7233) (by **Bob Cao**)
- [gui] Use templated bulk copy to simplify VBO preperation (#7234) (by **Bob Cao**)
- [rhi] Add create_image_unique stub & misc RHI bug fixes (#7232) (by **Bob Cao**)
- [opengl] Fix GLFW global context issue (#7230) (by **Bob Cao**)
- [examples] Remove dependency on `ti.u8` compute type for ngp (#7220) (by **Bob Cao**)
- [refactor] Remove Kernel::offload_to_executable (#7210) (by **PGZXB**)
- [opengl] RW image binding & FP16 support (#7219) (by **Bob Cao**)
- [Error] Remove deprecated a.atomic_op(b) in Taichi v1.6.0 (#7225) (by **Lin Jiang**)
- [Error] Remove deprecations in taichi/__init__.py in v1.6.0 (#7222) (by **Lin Jiang**)
- [Error] Raise error when using deprecated ifexp on matrices (#7224) (by **Lin Jiang**)
- [refactor] Remove legacy BitExtractStmt (#7221) (by **Yi Xu**)
- [amdgpu] Part4 link bitcode file (#7180) (by **Zeyu Li**)
- [example] Reorganize example oit_renderer (#7208) (by **Lin Jiang**)
- [aot] Fix ndarray aot with information from type hints (#7214) (by **Ailing**)
- [gui] Fix wide line support on macOS (#7205) (by **Bob Cao**)
- [Lang] Simplify the swizzle generator (#7216) (by **Zhao Liang**)
- [refactor] Split constructing and compilation of lang::Function (#7209) (by **PGZXB**)
- [doc] Fix netlify build command (#7217) (by **Ailing**)
- [ci] M1 buildbot release tag (#7213) (by **Proton**)
- [misc] Remove unused task_funcs (#7211) (by **PGZXB**)
- [refactor] Program::this_thread_config() -> Program::compile_config() (#7199) (by **PGZXB**)
- [doc] Fix format issues of windows debugging (#7197) (by **Olinaaaloompa**)
- [aot] More OpenGL interop in C-API (#7204) (by **PENGUINLIONG**)
- [metal] Disable a kernel test in offline cache to unblock CI (#7154) (by **Ailing**)
- [ci] Switch Windows build script to build.py (#6993) (by **Proton**)
- [misc] Update submodule taichi_assets (#7203) (by **Lin Jiang**)
- [mac] Use ObjectLinkingLayer instead of RTDyldObjectLinkingLayer for aarch64 mac (#7201) (by **Ailing**)
- [misc] Remove unused Program::jit_evaluator_id (#7200) (by **PGZXB**)
- [misc] Remove legacy latex generation (#7196) (by **Yi Xu**)
- [Lang] Remove the deprecated dynamic_index switch (#7195) (by **Yi Xu**)
- [bug] Fix check_matched() failure with Ndarray holding TensorType'd element (#7178) (by **Zhanlue Yang**)
- [Doc] Remove doc tutorial (#7198) (by **Olinaaaloompa**)
- [bug] Fix example circle-packing (#7194) (by **Lin Jiang**)
- [aot] C-API opengl runtime interop (#7120) (by **damnkk**)
- [Error] Better error message when creating sparse snodes on backends that do not support sparse (#7191) (by **Lin Jiang**)
- [example] Fix ti gallery close warning (#7187) (by **Zhao Liang**)
- [lang] Interface refactors for MatrixType and VectorType (#7143) (by **Zhanlue Yang**)
- [aot] Find Taichi in python wheel (#7181) (by **PENGUINLIONG**)
- [gui] Update circles rendering to use quads (#7163) (by **Bob Cao**)
- [Doc] Rename tutorial doc (#7186) (by **Zhao Liang**)
- [ir] Fix gcc cannot compile inline template specialization (#7179) (by **Lin Jiang**)
- [Doc] Update tutorial.md (#7176) (by **Zhao Liang**)
- [aot] Replace std::exchange with local implementation for C++11 (#7170) (by **PENGUINLIONG**)
- [ci] Fix near cache urls (missing comma) (#7158) (by **Proton**)
- [docs] Create windows_debug.md (#7164) (by **Bob Cao**)
- [Doc] Update math_module.md (#7175) (by **Zhao Liang**)
- [aot] FindTaichi CMake module to help outside project integration (#7168) (by **PENGUINLIONG**)
- [aot] Removed unused archs in C-API (#7167) (by **PENGUINLIONG**)
- [Doc] Update debugging.md (#7173) (by **Zhao Liang**)
- [refactor] Remove dependencies on Program::this_thread_config() in irpass::constant_fold (#7159) (by **PGZXB**)
- [Doc] Fix C++ tutorial does not display on doc site (#7174) (by **Zhao Liang**)
- [aot] C++ wrapper for memory slice and memory allocation with host access (#7171) (by **PENGUINLIONG**)
- [aot] Fixed ti_get_last_error signature (#7165) (by **PENGUINLIONG**)
- [misc] Log to stderr instead of stdout (#7166) (by **PENGUINLIONG**)
- [aot] C-API get version wrapper (#7169) (by **PENGUINLIONG**)
- [doc] Fix spelling of "paticle_field" (#7024) (by **Xiang (Kevin) Li**)
- [misc] Remove useless Program::sync (#7160) (by **PGZXB**)
- [doc] Update accelerate_python.md to use ti.max (#7161) (by **Tao Jin**)
- [doc] Add doc ndarray (#7157) (by **Olinaaaloompa**)
- [mac] Add .dylib and .cmake to built wheel (#7156) (by **Ailing**)
- [refactor] Remove dependencies on Program::this_thread_config() in some tests (#7155) (by **PGZXB**)
- [refactor] Remove dependencies on Program::this_thread_config() in llvm backends codegen (#7153) (by **PGZXB**)
- [Lang] Remove deprecated packed switch (#7104) (by **Yi Xu**)
- [example] Update quaternion arithmetics in fractal_3d_ggui (#7139) (by **Zhao Liang**)
- [doc] Update field.md (Fields advanced) (#6867) (by **Gabriel Vainer**)
- [ci] Use make_changelog.py to generate the full changelog (#7152) (by **Lin Jiang**)
- [refactor] Rename Callable::*arg* to Callable::*param* (#7133) (by **PGZXB**)
- [aot] Introduce new AOT deployment tutorial (#7144) (by **PENGUINLIONG**)
- [bug] Unify error message matching with/without validation layers for CapiTest.FailMapDeviceOnlyMemory (#7110) (by **Zhanlue Yang**)
- [lang] Remove redundant TensorType expansion for function returns (#7124) (by **Zhanlue Yang**)
- [lang] Sign python library for Apple M1 (#7138) (by **PENGUINLIONG**)
- [gui] Fix particle size limits (#7149) (by **Bob Cao**)
- [lang] Migrate TensorType expansion in MatrixType/VectorType from Python code to Frontend IR (#7127) (by **Zhanlue Yang**)
- [aot] Support texture arguments for AOT kernels (#7142) (by **Zhanlue Yang**)
- [metal] Retain Metal commandBuffers & build command buffers directly (#7137) (by **Bob Cao**)
- [rhi] Update `create_pipeline` API and add support of VkPipelineCache (#7091) (by **Bob Cao**)
- [autodiff] Support grad in ndarray (#6906) (by **PhrygianGates**)
- [Doc] Update doc regarding dynamic index (#7148) (by **Yi Xu**)
- [refactor] Remove dependencies on Program::this_thread_config() in spirv::lower (#7134) (by **PGZXB**)
- [Misc] Strictly check ndim with external array (#7126) (by **Haidong Lan**)
- [ci] Run test when pushing to rc branches (#7146) (by **Lin Jiang**)
- [refactor] Remove dependencies on Program::this_thread_config() in KernelCodeGen (#7086) (by **PGZXB**)
- [ci] Disable backward_cpp on macOS (#7145) (by **Proton**)
- [gui] Fix scene line renderable (#7131) (by **Bob Cao**)
- [refactor] Remove useless Kernel::from_cache_ (#7132) (by **PGZXB**)
- [cpu] Reuse VirtualMemoryAllocator for CPU ndarray memory allocation (#7128) (by **Ailing**)
- [Lang] Raise errors when using the packed switch (#7125) (by **Yi Xu**)
- [ci] Temporarily disable ad_external_array on Metal (#7136) (by **Bob Cao**)
- [Error] Raise errors when using metal sparse (#7113) (by **Lin Jiang**)
- [aot] AOT compat test in workflow (#7033) (by **damnkk**)
- [Lang] Fix cannot use taichi in REPL (#7114) (by **Zhao Liang**)
- [lang] Free ndarray memory when it's GC-ed in Python (#7072) (by **Ailing**)
- [lang] Migrate TensorType expansion for FuncCallExpression from Python code to Frontend IR (#6980) (by **Zhanlue Yang**)
- [amdgpu] Part2 add runtime (#6482) (by **Zeyu Li**)
- [refactor] Remove dependencies on Program::this_thread_config() in codegen_cc.cpp (#7088) (by **PGZXB**)
- [refactor] Remove dependencies on Program::this_thread_config() in gfx::run_codegen (#7089) (by **PGZXB**)
- [Bug] Fix num_splits in parallel_struct_for (#7121) (by **Yi Xu**)
- [Doc] Move glossary to top level (#7118) (by **Zhao Liang**)
- [metal] Update Metal RHI impl & add support for shared arrays (#7107) (by **Bob Cao**)
- [ci] Update amdgpu ci (#7117) (by **Zeyu Li**)
- [refactor] Move Kernel::lower() outside the taichi::lang::Kernel (#7048) (by **PGZXB**)
- [amdgpu] Part1 add codegen (#6469) (by **Zeyu Li**)
- [Aot] Deprecate element shape and field dim for AOT symbolic args (#7100) (by **Haidong Lan**)
- [refactor] Remove Program::current_ast_builder() (#7075) (by **PGZXB**)
- [aot] Switch Metal to SPIR-V codegen (#7093) (by **PENGUINLIONG**)
- [Lang] Remove deprecated ti.Matrix.rotation2d() (#7098) (by **Yi Xu**)
- [doc] Modified some errors in the function examples (#7094) (by **welann**)
- [ci] More Windows git hacks (#7102) (by **Proton**)
- [Lang] Remove filename kwarg in aot Module save() (#7085) (by **Ailing**)
- [aot] Rename device capability atomic_i64 to atomic_int64 for consistency (#7095) (by **PENGUINLIONG**)
- [Lang] Remove sourceinspect deprecation warning message (#7081) (by **Zhao Liang**)
- [example] Remove gui warning message (#7090) (by **Zhao Liang**)
- [refactor] Remove unnecessary Kernel::arch (#7074) (by **PGZXB**)
- [refactor] Remove unnecessary parameter of irpass::scalarize (#7087) (by **PGZXB**)
- [Bug] Fix ret_type and cast_type of UnaryOpStmt in Scalarize (#7082) (by **Yi Xu**)
- [lang] Migrate TensorType expansion for TextureOpExpression from Python code to Frontend IR (#6968) (by **Zhanlue Yang**)
- [lang] Migrate TensorType expansion for ReturnStmt from Python code to Frontend IR (#6946) (by **Zhanlue Yang**)
- [doc] Update ndarray deprecation warning to 1.5.0 (#7083) (by **Haidong Lan**)
- [amdgpu] Update amdgpu module call (#7022) (by **Zeyu Li**)
- [amdgpu] Add convert addressspace pass related unit test (#7023) (by **Zeyu Li**)
- [ir] Let real function return nested StructType (by **lin-hitonami**)
- [ir] Replace FuncCallExpression with FrontendFuncCallStmt (by **lin-hitonami**)
- [example] Update gallery images (#7053) (by **Zhao Liang**)
- [Doc] Update type.md (#7038) (by **Zhao Liang**)
- [misc] Bump version to v1.5.0 (#7077) (by **Lin Jiang**)
- [rhi] Update Stream `new_command_list` API (#7073) (by **Bob Cao**)
- [Doc] Fix docstring (#7065) (by **Zhao Liang**)
- [ci] Workaround windows checkout 'Needed a single revision' issue (#7078) (by **Proton**)
- [Lang] Make slicing a single row/column of a matrix return a vector (#7068) (by **Yi Xu**)
Published by github-actions[bot] over 1 year ago
Highlights:
Full changelog:
Published by github-actions[bot] almost 2 years ago
Taichi AOT is officially available in Taichi v1.4.0, along with a native Taichi Runtime (TiRT) library taichi_c_api. Native applications can now load compiled AOT modules and launch Taichi kernels without a Python interpreter.
In this release, TiRT has stabilized the Vulkan backend on desktop platforms and Android. You can find prebuilt TiRT binaries on the release page. You can refer to a comprehensive tutorial on the doc site; the detailed TiRT C-API documentation is available at https://docs.taichi-lang.org/docs/taichi_core.
Taichi ndarray is now formally released in v1.4.0. The ndarray is an array object that holds contiguous multi-dimensional data to allow easy exchange with external libraries. See documentation for more details.
Before v1.4.0, when you wanted to access a vector/matrix with a runtime variable instead of a compile-time constant, you had to set ti.init(dynamic_index=True). However, that option only works for LLVM-based backends (CPU & CUDA) and may slow down runtime performance because all matrices are affected. Starting from v1.4.0, that option is no longer needed. You can use variable indices whenever necessary on all backends without affecting the performance of those matrices with only constant indices.
Since v1.0.0, we have been enriching our taichi example collection, bringing the number of demos in the gallery window from eight to twelve. Run ti gallery to check out some new demos!
to_numpy()
and from_numpy()
. (#7008)Published by github-actions[bot] almost 2 years ago
packed
switch in ti.init()
is now deprecated and will be removed in v1.4.0. See the feature introduction below for details.ti.Matrix.rotation2d()
is now deprecated and will be removed in v1.4.0. Use ti.math.rotation2d()
instead.transpose()
on a vector is no longer allowed. If you want something like a @ b.transpose()
, write a.outer_product(b)
instead.element_dim
, element_shape
and field_dim
will be deprecated in v1.4.0. The field_dim
is renamed to ndim
to make it more intuitive. element_dim
and element_shape
will be replaced by passing a matrix type into dtype
argument. For example, the ti.types.ndarray(element_dim=2, element_shape=(3,3))
will be replaced by ti.types.ndarray(dtype=ti.matrix(3,3))
.To support variable-length fields, Taichi provides dynamic SNodes.
You can now use the dynamic SNode on fields of different data types, even struct fields and matrix fields.
You can use x[i].append(...)
to append an element, use x[i].length()
to get the length, and use x[i].deactivate()
to clear the list as shown in the following code snippet.
pair = ti.types.struct(a=ti.i16, b=ti.i64)
pair_field = pair.field()
block = ti.root.dense(ti.i, 4)
pixel = block.dynamic(ti.j, 100, chunk_size=4)
pixel.place(pair_field)
l = ti.field(ti.i32)
ti.root.dense(ti.i, 5).place(l)
@ti.kernel
def dynamic_pair():
for i in range(4):
pair_field[i].deactivate()
for j in range(i * i):
pair_field[i].append(pair(i, j + 1))
# pair_field = [[],
# [(1, 1)],
# [(2, 1), (2, 2), (2, 3), (2, 4)],
# [(3, 1), (3, 2), ... , (3, 8), (3, 9)]]
l[i] = pair_field[i].length() # l = [0, 1, 4, 9]
Packed mode was introduced in v0.8.0 to allow users to trade runtime performance for memory usage. In v1.3.0, after the elimination of runtime overhead in common cases, packed mode has become the default mode. There's no longer any automatic padding behavior behind the scenes, so users can use fields and SNodes without surprise.
We introduce the experimental sparse matrix and sparse solver on the CUDA backend. The API of using is the same as CPU backend. Currently, only the f32
data type and LLT linear solver are supported on CUDA. You can only use ti.ndarray
to compute SpMV and linear solver operation. Float64 data type and other linear solvers are under implementation.
taichi_ngp
to check it out!Starting from v1.3.0, Taichi has upgraded its LLVM dependency to version 15.0.0. If you're interested in contributing or simply building Taichi from source, please follow our installation doc for developers.
Note this change has no impact on Taichi users.
Published by github-actions[bot] almost 2 years ago
Molten-vk version is downgraded to v1.1.10 to fix a few GGUI issues.
Full changelog:
Published by github-actions[bot] almost 2 years ago
This is a bug fix release for v1.2.0.
Full changelog:
Published by github-actions[bot] almost 2 years ago
Starting from the v1.2.0 release, Taichi follows semantic versioning where regular releases cutting from master branch bumps MINOR version and PATCH version is only bumped when cherry-picking critial bug fixes.
Indexing multi-dimensional ti.ndrange()
with a single loop index will be disallowed in future releases.
We introduced the offline cache on CPU and CUDA backends in v1.1.0. In this release, we support this feature on other backends, including Vulkan, OpenGL, and Metal.
TI_OFFLINE_CACHE=0
or offline_cache=False
in the ti.init()
method call and file an issue with us on Taichi's GitHub repo.A checker is provided for detecting potential violations of global data access rules.
debug=True
when calling ti.init()
.validation=True
when using ti.ad.Tape()
to validate the kernels captured by ti.ad.Tape()
.For example:
import taichi as ti
ti.init(debug=True)
N = 5
x = ti.field(dtype=ti.f32, shape=N, needs_grad=True)
loss = ti.field(dtype=ti.f32, shape=(), needs_grad=True)
b = ti.field(dtype=ti.f32, shape=(), needs_grad=True)
@ti.kernel
def func_1():
for i in range(N):
loss[None] += x[i] * b[None]
@ti.kernel
def func_2():
b[None] += 100
b[None] = 10
with ti.ad.Tape(loss, validation=True):
func_1()
func_2()
"""
taichi.lang.exception.TaichiAssertionError:
(kernel=func_2_c78_0) Breaks the global data access rule. Snode S10 is overwritten unexpectedly.
File "across_kernel.py", line 16, in func_2:
b[None] += 100
^^^^^^^^^^^^^^
"""
Improved Vulkan performance with loops (#6072) (by Lin Jiang)
PrefixSumExecutor is added to improve the performance of prefix-sum operations. The legacy prefix-sum function allocates auxiliary gpu buffers at every function call, which causes an obvious performance problem. The new PrefixSumExecutor is able to avoid allocating buffers again and again. For arrays with the same length, the PrefixSumExecutor only needs to be initialized once, then it is able to perform any number of times prefix-sum operations without redundant field allocations. The prefix-sum operation is only supported on CUDA backend currently. (#6132) (by Yu Zhang)
Usage:
N = 100
arr0 = ti.field(dtype, N)
arr1 = ti.field(dtype, N)
arr2 = ti.field(dtype, N)
arr3 = ti.field(dtype, N)
arr4 = ti.field(dtype, N)
# initialize arr0, arr1, arr2, arr3, arr4, ...
# ...
# Performing an inclusive in-place's parallel prefix sum,
# only one executor is needed for a specified sorting length.
executor = ti.algorithms.PrefixSumExecutor(N)
executor.run(arr0)
executor.run(arr1)
executor.run(arr2)
executor.run(arr3)
executor.run(arr4)
Runtime integer overflow detection on addition, subtraction, multiplication and shift left operators on Vulkan, CPU and CUDA backends is now available when debug mode is on. To use overflow detection on Vulkan backend, you need to enable printing, and the overflow detection of 64-bit multiplication on Vulkan backend requires NVIDIA driver 510 or higher. (#6178) (#6279) (by Lin Jiang)
For the following program:
import taichi as ti
ti.init(debug=True)
@ti.kernel
def add(a: ti.u64, b: ti.u64)->ti.u64:
return a + b
add(2 ** 63, 2 ** 63)
The following warning is printed at runtime:
Addition overflow detected in File "/home/lin/test/overflow.py", line 7, in add:
return a + b
^^^^^
Printing is now supported on Vulkan backend on Unix/Windows platforms. To enable printing on vulkan backend, follow instructions at https://docs.taichi-lang.org/docs/master/debugging#applicable-backends (#6075) (by Ailing)
Three new examples from community contributors are also merged in this release. They include:
You can view these examples by running ti example in terminal and select the corresponding index.
Published by github-actions[bot] about 2 years ago
Highlights:
Full changelog:
Published by github-actions[bot] about 2 years ago
This is a bug fix release for v1.1.0.
Full changelog:
Published by github-actions[bot] about 2 years ago
High-resolution simulations can deliver great visual quality, but are often limited by the capacity of the onboard GPU memory. This release adds quantized data types, allowing you to define your own integers, fixed-point numbers, or floating-point numbers of arbitrary number of bits that may strike a balance between your hardware limits and simulation effects. See Using quantized data types for a comprehensive introduction.
A Taichi kernel is implicitly compiled the first time it is called. The compilation results are kept in an online in-memory cache to reduce the overhead in the subsequent function calls. As long as the kernel function is unchanged, it can be directly loaded and launched. The cache, however, is no longer available when the program terminates. Then, if you run the program again, Taichi has to re-compile all kernel functions and reconstruct the online in-memory cache. And the first launch of a Taichi function is always slow due to the compilation overhead.
To address this problem, this release adds the offline cache feature, which dumps the compilation cache to the disk for future runs. The first launch overhead can be drastically reduced in subsequent runs. Taichi now constructs and maintains an offline cache by default.
The following table shows the launch overhead of running cornell_box on the CUDA backend with and without offline cache:
Time spent on compilation and cached data loading | |
---|---|
Offline cache disabled | 24.856s |
Offline cache enabled (1st run) | 25.435s |
Offline cache enabled (2nd run) | 0.677s |
Note that, for now, the offline cache feature works only on the CPU and CUDA backends. If your code behaves abnormally, disable offline cache by setting the environment variable TI_OFFLINE_CACHE=0
or ti.init(offline_cache=False)
and file an issue with us on Taichi's GitHub repo. See Offline cache for more information.
Adds forward-mode automatic differentiation via ti.ad.FwdMode
. Unlike the existing reverse-mode automatic differentiation, which computes vector-Jacobian product (vJp), forward-mode computes Jacobian-vector product (Jvp) when evaluating derivatives. Therefore, forward-mode automatic differentiation is much more efficient in situations where the number of a function's outputs is greater than its inputs. Read this example, which demonstrates Jacobian matrix computation in forward mode and reverse mode.
GPU's shared memory is a fast small memory that is visible within each thread block (or workgroup in Vulkan). It is widely used in scenarios where performance is a crucial concern. To give you access to your GPU's shared memory, this release adds the SharedArray API under the namespace ti.simt.block
.
The following diagram illustrates the performance benefits of Taichi's SharedArray. With SharedArray, Taichi Lang is comparable to or even outperforms the equivalent CUDA code.
Taichi now supports texture bilinear sampling and raw texel fetch on both Vulkan and OpenGL backends. This feature leverages the hardware texture unit and diminishes the need for manual composition of bilinear interpolation code in image processing tasks. This feature also provides an easy way for texture mapping in tasks such as rasterization or ray-tracing. On Vulkan backend, Taichi additionally supports image load and store. You can directly manipulate texels of an image and use this very image in subsequent texture mapping.
Note that the current texture and image APIs are in the early stages and subject to change. In the future we plan to support bindless textures to extend to tasks such as ray-tracing. We also plan to extend full texture support to all backends that support texture APIs.
Run ti example simple_texture
to see an example of texture support!
ti.ui.Window.get_depth_buffer(field)
;ti.ui.Window.get_depth_buffer_as_numpy()
.Scene.lines(vertices, width)
.ti.Matrix.field(4, 4, ti.f32, shape=N)
) and call ti.ui.Scene.mesh_instance(vertices, transforms=TransformMatrixField)
to put various mesh instances at different places.Scene.mesh()
or Scene.mesh_instance()
by setting show_wireframe=True
.Taichi dataclass: Taichi now recommends using the @ti.dataclass
decorator to define struct types, or even attach functions to them. See Taichi dataclasses for more information.
@ti.dataclass
class Sphere:
center: vec3
radius: ti.f32
@ti.func
def area(self):
# a function to run in taichi scope
return 4 * math.pi * self.radius * self.radius
def is_zero_sized(self):
# a python scope function
return self.radius == 0.0
As shown in the dataclass example above, vec2
, vec3
, and vec4
in the taichi.math
module (same for ivec
and uvec
) can be directly used as type hints. The numeric precision of these types is determined by default_ip
or default_fp
in ti.init()
.
More flexible instantiation for a struct
or dataclass
:
In earlier releases, to instantiate a taichi.types.struct
and taichi.dataclass
, you have to explicitly put down a complete list of member-value pairs like:
ray = Ray(ro=vec3(0), rd=vec3(1, 0, 0), t=1.0)
As of this release, you are given more options. The positional arguments are passed to the struct members in the order they are defined; the keyword arguments set the corresponding struct members. Unspecified struct members are automatically set to zero. For example:
# use positional arguments to set struct members in order
ray = Ray(vec3(0), vec3(1, 0, 0), 1.0)
# ro is set to vec3(0) and t will be set to 0
ray = Ray(vec3(0), rd=vec3(1, 0, 0))
# both ro and rd are set to vec3(0)
ray = Ray(t=1.0)
# ro is set to vec3(1), rd=vec3(0) and t=0.0
ray = Ray(1)
# all members are set to 0.
ray = Ray()
Supports calling fill()
from both the Python scope and the Taichi scope.
In earlier releases, you can only call fill()
from the Python scope, which is a method in the ScalarField
or MatrixField
class. As of this release, you can call this method from either the Python scope or the Taichi scope. See the following code snippet:
x = ti.field(int, shape=(10, 10))
x.fill(1)
@ti.kernel
def test():
x.fill(-1)
More flexible initialization for customized matrix types:
As the following code snippet shows, matrix types created using taichi.types.matrix()
or taichi.types.vector()
can be initialized more flexibly: Taichi automatically combines the inputs and converts them to a matrix whose shape matches the shape of the target matrix type.
# mat2 and vec3 are predefined types in the ti.math module
mat2 = ti.types.matrix(2, 2, float)
vec3 = ti.types.vector(3, float)
m = mat2(1) # [[1., 1.], [1., 1.]]
m = mat2(1, 2, 3, 4) # [[1., 2.], [3, 4.]]
m = mat2([1, 2], [3, 4]) # [[1., 2.], [3, 4.]]
m = mat2([1, 2, 3, 4]) # [[1., 2.], [3, 4.]]
v = vec3(1, 2, 3)
m = mat2(v, 4) # [[1., 2.], [3, 4.]]
Makes ti.f32(x)
syntax sugar for ti.cast(x, ti.f32)
, if x
is neither a literal nor of a compound data type. Same for other primitive types such as ti.i32
, ti.u8
, or ti.f64
.
More convenient axes order adjustment: A common way to improve the performance of a Taichi program is to adjust the order of axes when laying out field data in the memory. In earlier releases, this requires in-depth knowledge about the data definition language (the SNode system) and may become an extra burden in situations where sparse data structures are not required. As of this release, Taichi supports specifying the order of axes when defining a Taichi field.
# Before
x = ti.field(ti.i32)
y = ti.field(ti.i32)
ti.root.dense(ti.i, M).dense(ti.j, N).place(x) # row-major
ti.root.dense(ti.j, N).dense(ti.i, M).place(y) # column-major
# New syntax
x = ti.field(ti.i32, shape=(M, N), order='ij')
y = ti.field(ti.i32, shape=(M, N), order='ji')
# SoA vs. AoS example
p = ti.Vector.field(3, ti.i32, shape=(M, N), order='ji', layout=ti.Layout.SOA)
q = ti.Vector.field(3, ti.i32, shape=(M, N), order='ji', layout=ti.Layout.AOS)
pow()
has a negative exponent (#5275)ti.ndrange
(#4478)ti.BitpackedFields
ti.from_paddle
ti.to_paddle
ti.FieldsBuilder.lazy_dual
ti.math module
ti.Texture
ti.ref
ti.dataclass
ti.simt.block.SharedArray
Old API | New API |
---|---|
ti.clear_all_gradients |
ti.ad.clear_all_gradients |
ti.Tape |
ti.ad.Tape |
ti.FieldsBuilder.bit_array |
ti.FieldsBuilder.quant_array |
ti.ui.Window.write_image |
ti.ui.Window.save_image |
ti.ui.Window.GUI |
ti.ui.Window.get_gui |
ti.ui.make_camera
: Please construct cameras with ti.ui.Camera
instead.As announced in v1.0.0 release, we no longer provide official python3.6 wheels through pypi. Users who need taichi with python3.6 may still build from source but its support is not guaranteed.
The taichi_glsl
package on pypi will no longer be maintained as of this release. GLSL-related features will be implemented in the official taichi.math
module, which includes data types and handy functions for daily math and shader development:
vec2
, vec3
, and vec4
.mat2
,mat3
, and mat4
.step()
,clamp()
, and smoothstep()
.Official support for MacOS Mojave (10.14, released in 2018) will be dropped starting from v1.2.0. Please upgrade your MacOS if possible or let us know if you have any concerns.
Published by github-actions[bot] over 2 years ago
Highlights:
Full changelog:
OpLoad
with physical address (#5212) (by PENGUINLIONG)Published by github-actions[bot] over 2 years ago
Highlights:
Full changelog:
Published by github-actions[bot] over 2 years ago
Highlights:
The v1.0.2 release is a patch fix that improves Taichi's stability on multiple platforms, especially for GGUI and the Vulkan backend.
Full changelog:
Published by github-actions[bot] over 2 years ago
Highlights:
Full changelog:
Published by github-actions[bot] over 2 years ago
v1.0.0 was released on April 13, 2022.
Taichi's license is changed from MIT to Apache-2.0 after a public vote in #4607.
This release supports Python 3.10 on all supported operating systems (Windows, macOS, and Linux).
Before v1.0.0, Taichi works only on Linux distributions that support glibc 2.27+ (for example Ubuntu 18.04+). As of v1.0.0, in addition to the normal Taichi wheels, Taichi provides the manylinux2014-compatible wheels to work on most modern Linux distributions, including CentOS 7.
ti.ext_arr()
and uses ti.types.ndarray()
instead. ti.types.ndarray()
supports both Taichi Ndarrays and external arrays, for example NumPy arrays.By working together with OPPO US Research Center, Taichi delivers Taichi AOT, a solution for deploying kernels in non-Python environments, such as in mobile devices.
Compiled Taichi kernels can be saved from a Python process, then loaded and run by the provided C++ runtime library. With a set of APIs, your Python/Taichi code can be easily deployed in any C++ environment. We demonstrate the simplicity of this workflow by porting the implicit FEM (finite element method) demo released in v0.9.0 to an Android application. Download the Android package and find out what Taichi AOT has to offer! If you want to try out this solution, please also check out the taichi-aot-demo
repo.
# In Python app.py
module = ti.aot.Module(ti.vulkan)
module.add_kernel(my_kernel, template_args={'x': x})
module.save('my_app')
The following code snippet shows the C++ workflow for loading the compiled AOT modules.
// Initialize Vulkan program pipeline
taichi::lang::vulkan::VulkanDeviceCreator::Params evd_params;
evd_params.api_version = VK_API_VERSION_1_2;
auto embedded_device =
std::make_unique<taichi::lang::vulkan::VulkanDeviceCreator>(evd_params);
std::vector<uint64_t> host_result_buffer;
host_result_buffer.resize(taichi_result_buffer_entries);
taichi::lang::vulkan::VkRuntime::Params params;
params.host_result_buffer = host_result_buffer.data();
params.device = embedded_device->device();
auto vulkan_runtime = std::make_unique<taichi::lang::vulkan::VkRuntime>(std::move(params));
// Load AOT module saved from Python
taichi::lang::vulkan::AotModuleParams aot_params{"my_app", vulkan_runtime.get()};
auto module = taichi::lang::aot::Module::load(taichi::Arch::vulkan, aot_params);
auto my_kernel = module->get_kernel("my_kernel");
// Allocate device buffer
taichi::lang::Device::AllocParams alloc_params;
alloc_params.host_write = true;
alloc_params.size = /*Ndarray size for `x`*/;
alloc_params.usage = taichi::lang::AllocUsage::Storage;
auto devalloc_x = embedded_device->device()->allocate_memory(alloc_params);
// Execute my_kernel without Python environment
taichi::lang::RuntimeContext host_ctx;
host_ctx.set_arg_devalloc(/*arg_id=*/0, devalloc_x, /*shape=*/{128}, /*element_shape=*/{3, 1});
my_kernel->launch(&host_ctx);
Note that Taichi only supports the Vulkan backend in the C++ runtime library. The Taichi team is working on supporting more backends.
All Taichi functions are inlined into the Taichi kernel during compile time. However, the kernel becomes lengthy and requires longer compile time if it has too many Taichi function calls. This becomes especially obvious if a Taichi function involves compile-time recursion. For example, the following code calculates the Fibonacci numbers recursively:
@ti.func
def fib_impl(n: ti.template()):
if ti.static(n <= 0):
return 0
if ti.static(n == 1):
return 1
return fib_impl(n - 1) + fib_impl(n - 2)
@ti.kernel
def fibonacci(n: ti.template()):
print(fib_impl(n))
In this code, fib_impl()
recursively calls itself until n
reaches 1
or 0
. The total time of the calls to fib_impl()
increases exponentially as n
grows, so the length of the kernel also increases exponentially. When n
reaches 25
, it takes more than a minute to compile the kernel.
This release introduces "real function", a new type of Taichi function that compiles independently instead of being inlined into the kernel. It is an experimental feature and only supports scalar arguments and scalar return value for now.
You can use it by decorating the function with @ti.experimental.real_func
. For example, the following is the real function version of the code above.
@ti.experimental.real_func
def fib_impl(n: ti.i32) -> ti.i32:
if n <= 0:
return 0
if n == 1:
return 1
return fib_impl(n - 1) + fib_impl(n - 2)
@ti.kernel
def fibonacci(n: ti.i32):
print(fib_impl(n))
The length of the kernel does not increase as n
grows because the kernel only makes a call to the function instead of inlining the whole function. As a result, the code takes far less than a second to compile regardless of the value of n
.
The main differences between a normal Taichi function and a real function are listed below:
if
/ for
/ while
statements in a normal Taichi function.Previously, you cannot explicitly give a type to a literal. For example,
@ti.kernel
def foo():
a = 2891336453 # i32 overflow (>2^31-1)
In the code snippet above, 2891336453
is first turned into a default integer type (ti.i32
if not changed). This causes an overflow. Starting from v1.0.0, you can write type annotations for literals:
@ti.kernel
def foo():
a = ti.u32(2891336453) # similar to 2891336453u in C
You can use ti.loop_config
to control the behavior of the subsequent top-level for-loop. Available parameters are:
block_dim
: Sets the number of threads in a block on GPU.parallelize
: Sets the number of threads to use on CPU.serialize
: If you set serialize
to True
, the for-loop runs serially, and you can write break statements inside it (Only applies on range/ndrange for-loops). Setting serialize
to True
Equals setting parallelize
to 1
.Here are two examples:
@ti.kernel
def break_in_serial_for() -> ti.i32:
a = 0
ti.loop_config(serialize=True)
for i in range(100): # This loop runs serially
a += i
if i == 10:
break
return a
break_in_serial_for() # returns 55
n = 128
val = ti.field(ti.i32, shape=n)
@ti.kernel
def fill():
ti.loop_config(parallelize=8, block_dim=16)
# If the kernel is run on the CPU backend, 8 threads will be used to run it
# If the kernel is run on the CUDA backend, each block will have 16 threads
for i in range(n):
val[i] = i
math
moduleThis release adds a math
module to support GLSL-standard vector operations and to make it easier to port GLSL shader code to Taichi. For example, vector types, including vec2
, vec3
, vec4
, mat2
, mat3
, and mat4
, and functions, including mix()
, clamp()
, and smoothstep()
, act similarly to their counterparts in GLSL. See the following examples:
You can use the rgba
, xyzw
, uvw
properties to get and set vector entries:
import taichi.math as tm
@ti.kernel
def example():
v = tm.vec3(1.0) # (1.0, 1.0, 1.0)
w = tm.vec4(0.0, 1.0, 2.0, 3.0)
v.rgg += 1.0 # v = (2.0, 3.0, 1.0)
w.zxy += tm.sin(v)
Each Taichi vector is implemented as a column vector. Ensure that you put the the matrix before the vector in a matrix multiplication.
@ti.kernel
def example():
M = ti.Matrix([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
v = tm.vec3(1, 2, 3)
w = (M @ v).xyz # [1, 2, 3]
@ti.kernel
def example():
v = tm.vec3(0., 1., 2.)
w = tm.smoothstep(0.0, 1.0, v.xyz)
w = tm.clamp(w, 0.2, 0.8)
ti gallery
This release introduces a CLI command ti gallery
, allowing you to select and run Taichi examples in a pop-up window. To do so:
ti gallery
A window pops up:
As of v1.0.0, Taichi accepts matrix or vector types as parameters and return values. You can use ti.types.matrix
or ti.types.vector
as the type annotations.
Taichi also supports basic, read-only matrix slicing. Use the mat[:,:]
syntax to quickly retrieve a specific portion of a matrix. See Slicings for more information.
The following code example shows how to get numbers in four corners of a 3x3
matrix mat
:
import taichi as ti
ti.init()
@ti.kernel
def foo(mat: ti.types.matrix(3, 3, ti.i32)) -> ti.types.matrix(2, 2, ti.i32)
corners = mat[::2, ::2]
return corners
mat = ti.Matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
corners = foo(mat) # [[1 3] [7 9]]
Note that in a slice, the lower bound, the upper bound, and the stride must be constant integers. If you want to use a variable index together with a slice, you should set ti.init(dynamic_index=True)
. For example:
import taichi as ti
ti.init(dynamic_index=True)
@ti.kernel
def foo(mat: ti.types.matrix(3, 3, ti.i32), ind: ti.i32) -> ti.types.matrix(3, 1, ti.i32):
col = mat[:, ind]
return col
mat = ti.Matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
col = foo(mat, 2) # [3 6 9]
Flexiblity is key to the user experience of an automatic-differentiation (AD) system. Before v1.0.0, Taichi AD system requires that a differentiable Taichi kernel only consist multiple simply nested for-loops (shown in task1
below). This was once called the Kernel Simplicity Rule (KSR). KSR prevents Taichi's users from writing differentiable kernels with multiple serial for-loops (shown in task2
below) or with a mixture of serial for-loop and non-for statements (shown in task3
below).
# OK: multiple simply nested for-loops
@ti.kernel
def task1():
for i in range(2):
for j in range(3):
for k in range(3):
y[None] += x[None]
# Error: multiple serial for-loops
@ti.kernel
def task2():
for i in range(2):
for j in range(3):
y[None] += x[None]
for j in range(3):
y[None] += x[None]
# Error: a mixture of serial for-loop and non-for
@ti.kernel
def task3():
for i in range(2):
y[None] += x[None]
for j in range(3):
y[None] += x[None]
With KSR being removed from this release, code with different kinds of for-loops structures can be differentiated, as shown in the snippet below.
# OK: A complicated control flow that is still differentiable in Taichi
for j in range(2):
for i in range(3):
y[None] += x[None]
for i in range(3):
for ii in range(2):
y[None] += x[None]
for iii in range(2):
y[None] += x[None]
for iv in range(2):
y[None] += x[None]
for i in range(3):
for ii in range(2):
for iii in range(2):
y[None] += x[None]
Taichi provides a demo to demonstrate how to implement a differentiable simulator using this enhanced Taichi AD system.
assert
statementThis release supports including an f-string in an assert
statement as an error message. You can include scalar variables in the f-string. See the example below:
import taichi as ti
ti.init(debug=True)
@ti.kernel
def assert_is_zero(n: ti.i32):
assert n == 0, f"The number is {n}, not zero"
assert_is_zero(42) # TaichiAssertionError: The number is 42, not zero
Note that the assert
statement works only in debug mode.
This release comes with the first version of the Taichi language specification, which attempts to provide an exhaustive description of the syntax and semantics of the Taichi language and makes a decent reference for Taichi's users and developers when they determine if a specific behavior is correct, buggy, or undefined.
Deprecated | Replaced by |
---|---|
ti.ext_arr() |
ti.types.ndarray() |
Published by github-actions[bot] over 2 years ago
Highlights:
Full changelog:
Published by rexwangcc over 2 years ago
Highlights:
Full changelog: