Commit Graph

157 Commits

Author SHA1 Message Date
Magnus Ulimoen 3edd18c4fd Use core_simd over packed_simd 2021-07-27 18:32:22 +00:00
Magnus Ulimoen 8873f458b4 useless conversion 2021-06-29 18:07:26 +02:00
Magnus Ulimoen 9e2ce3ae24 Add Metrics iterator 2021-03-26 16:27:45 +01:00
Magnus Ulimoen a33e1d37ba Document what compiler is doing for diffxi 2021-03-25 22:39:00 +01:00
Magnus Ulimoen 76f5291131 Elide bounds check in diffxi
array_windows.skip did not elide bounds checks as it should. If
the slice is instead offset by the skipped amount, we have the
same behavour, but aids the compiler enough.
The two changed lines allows SIMD optimisations, giving an
impressive reduction in instructions by two thirds in the
benchmark.
2021-03-25 17:23:01 +01:00
Magnus Ulimoen 4ae5c02bb1 Replace FastFloat with mul_add 2021-03-23 19:21:38 +01:00
Magnus Ulimoen 7aadda3de9 Move integrate to separate crate 2021-03-22 17:49:35 +01:00
Magnus Ulimoen be1330ec02 Add constrmatrix as separate crate 2021-03-22 16:24:32 +01:00
Magnus Ulimoen 502679c9a1 Move Float to separate crate 2021-03-22 16:17:27 +01:00
Magnus Ulimoen be984fbdac Bump sprs to 0.10 2021-03-18 23:27:03 +01:00
Magnus Ulimoen 8383517ba3 ensure slice can be cast to Matrix 2021-03-15 20:18:19 +01:00
Magnus Ulimoen 17ab18e953 zero-pad diffxi kernel 2021-03-15 20:07:41 +01:00
Magnus Ulimoen e43e71a4d8 Make flip_XX impl on Matrix 2021-03-15 19:31:41 +01:00
Magnus Ulimoen 6fc045ae17 Replace transmute with cast 2021-02-12 19:02:13 +01:00
Magnus Ulimoen a02c7daafc remove iterator inhibiting optimisation 2021-02-10 21:17:05 +01:00
Magnus Ulimoen 02175d1734 use some unsafe... 2021-02-10 19:29:26 +01:00
Magnus Ulimoen 8a6dc60edf remove some unsafe from simd 2021-02-10 19:02:48 +01:00
Magnus Ulimoen 87c055f81e Add back simd column algo 2021-02-09 21:44:35 +01:00
Magnus Ulimoen cf4d8f1e9b add Zero for constmatrix 2021-02-03 08:41:11 +01:00
Magnus Ulimoen 7ec426b5a8 add more fast intrinsics 2021-02-03 08:31:33 +01:00
Magnus Ulimoen 64a4e92dd2 specialise on contigous ny 2021-02-02 00:33:37 +01:00
Magnus Ulimoen c709cf465e remove ndarray transmute 2021-02-02 00:12:03 +01:00
Magnus Ulimoen 74d99a4a18 try ndarray transmute 2021-02-02 00:12:03 +01:00
Magnus Ulimoen b15ea57e6d inline for perf 2021-02-02 00:12:03 +01:00
Magnus Ulimoen 299b4f8083 ensure FastFloat flag works 2021-02-02 00:12:03 +01:00
Magnus Ulimoen 6f7268bf33 use matrices everywhere 2021-02-02 00:12:02 +01:00
Magnus Ulimoen 31ac46e386 move data structs into separate files 2021-02-02 00:12:02 +01:00
Magnus Ulimoen f7f8a7ffff make flip const functions 2021-02-02 00:12:02 +01:00
Magnus Ulimoen 1f3aa2c116 make core-intrinsics cfg'ed 2021-02-02 00:12:02 +01:00
Magnus Ulimoen c73c6e7407 diff_op_col_naive_matrix 2021-02-02 00:12:02 +01:00
Magnus Ulimoen c660354c3f Matrix for Upwind4 2021-02-02 00:12:02 +01:00
Magnus Ulimoen 45e4d51513 remove a closure 2021-02-02 00:12:02 +01:00
Magnus Ulimoen b0e1ec62f8 change order in matmul_into 2021-02-02 00:12:02 +01:00
Magnus Ulimoen 481f2d607e remove a lot of unsafe, lost perf 2021-02-02 00:12:02 +01:00
Magnus Ulimoen 00f3ba6a01 use split_at_mut 2021-02-02 00:12:02 +01:00
Magnus Ulimoen db94caf2b2 change repr of Matrix 2021-02-02 00:12:02 +01:00
Magnus Ulimoen c133557459 use Matrix in SBP diff 2021-02-02 00:12:02 +01:00
Magnus Ulimoen 3c7cc4605a simplify traits 2021-02-02 00:12:02 +01:00
Magnus Ulimoen f7c238f6a7 add blockend SBP8 2021-02-02 00:12:02 +01:00
Magnus Ulimoen a7660281c8 add inline to remove magic fix 2021-02-02 00:12:02 +01:00
Magnus Ulimoen 14fefe97ab add Matrix approach for SBP8 2021-02-02 00:12:02 +01:00
Magnus Ulimoen 3d34f2e7a0 add fast-float feature 2021-02-02 00:12:02 +01:00
Magnus Ulimoen 4c2daf5933 minor thingys 2021-02-02 00:12:02 +01:00
Magnus Ulimoen db552af4ff add blockend with weird caveat 2021-02-02 00:12:02 +01:00
Magnus Ulimoen bcda26a512 15% reductions in instr count 2021-02-02 00:12:02 +01:00
Magnus Ulimoen 36293e75e6 10% instr reduction with fast_ intr 2021-02-02 00:12:02 +01:00
Magnus Ulimoen 30c563c19d working d1 SBP4 2021-02-02 00:12:02 +01:00
Magnus Ulimoen 94e8fb5b7c checkpoint 2021-02-02 00:12:02 +01:00
Magnus Ulimoen c104082ac0 add matrix type 2021-02-02 00:12:02 +01:00
Magnus Ulimoen 1069bad145 bugfix sparse algo visibility 2021-01-25 20:53:56 +01:00