Age | Commit message (Collapse) | Author |
---|
| [Bug #20522] If `Warning.warn` is redefined in Ruby, emitting a warning would invoke Ruby code, which can't safely be done when YJIT is compiling. |
| They were initially made frozen to avoid false positives for cases such as: str = str.dup if str.frozen? But this may cause bugs and is generally confusing for users. [Feature #20205] Co-authored-by: Jean Boussier <[email protected]> |
| Instructions for this code: ```ruby # frozen_string_literal: true [a].pack("C") ``` Before this commit: ``` == disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)> 0000 putself ( 3)[Li] 0001 opt_send_without_block <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE> 0003 newarray 1 0005 putobject "C" 0007 opt_send_without_block <calldata!mid:pack, argc:1, ARGS_SIMPLE> 0009 leave ``` After this commit: ``` == disasm: #<ISeq:<main>@test.rb:1 (1,0)-(3,13)> 0000 putself ( 3)[Li] 0001 opt_send_without_block <calldata!mid:a, argc:0, FCALL|VCALL|ARGS_SIMPLE> 0003 putobject "C" 0005 opt_newarray_send 2, :pack 0008 leave ``` Co-authored-by: Maxime Chevalier-Boisvert <[email protected]> Co-authored-by: Aaron Patterson <[email protected]> |
| * YJIT: Add specialized codegen function for `TrueClass#===` TrueClass#=== is currently number 10 in the most frequent C calls list of the lobsters benchmark. ``` require "benchmark/ips" def wrap true === true true === false true === :x end Benchmark.ips do |x| x.report(:wrap) do wrap end end ``` ``` before Warming up -------------------------------------- wrap 1.791M i/100ms Calculating ------------------------------------- wrap 17.806M (± 1.0%) i/s - 89.544M in 5.029363s after Warming up -------------------------------------- wrap 4.024M i/100ms Calculating ------------------------------------- wrap 40.149M (± 1.1%) i/s - 201.223M in 5.012527s ``` Co-authored-by: Maxime Chevalier-Boisvert <[email protected]> Co-authored-by: Takashi Kokubun (k0kubun) <[email protected]> Co-authored-by: Kevin Menard <[email protected]> Co-authored-by: Alan Wu <[email protected]> * Fix the new test for RJIT --------- Co-authored-by: Maxime Chevalier-Boisvert <[email protected]> Co-authored-by: Takashi Kokubun (k0kubun) <[email protected]> Co-authored-by: Kevin Menard <[email protected]> Co-authored-by: Alan Wu <[email protected]> |
| * Revert "Revert "YJIT: Optimize local variables when EP == BP" (#10584)" This reverts commit c8783441952217c18e523749c821f82cd7e5d222. * YJIT: Take care of GC references in ISEQ invariants Co-authored-by: Alan Wu <[email protected]> --------- Co-authored-by: Alan Wu <[email protected]> |
| Add a specialized codegen function for `Class#superclass`. Co-authored-by: Maxime Chevalier-Boisvert <[email protected]> Co-authored-by: Takashi Kokubun (k0kubun) <[email protected]> Co-authored-by: Randy Stauner <[email protected]> Co-authored-by: Alan Wu <[email protected]> |
| This reverts commit 4cc58ea0b865f2fd20f1e881ddbd4c4fab0b072c. Since the change landed call-threshold=1 CI runs have been timing out. There has also been `verify-ctx` violations. Revert for now while we debug. |
| |
| [Feature #20205] As a path toward enabling frozen string literals by default in the future, this commit introduce "chilled strings". From a user perspective chilled strings pretend to be frozen, but on the first attempt to mutate them, they lose their frozen status and emit a warning rather than to raise a `FrozenError`. Implementation wise, `rb_compile_option_struct.frozen_string_literal` is no longer a boolean but a tri-state of `enabled/disabled/unset`. When code is compiled with frozen string literals neither explictly enabled or disabled, string literals are compiled with a new `putchilledstring` instruction. This instruction is identical to `putstring` except it marks the String with the `STR_CHILLED (FL_USER3)` and `FL_FREEZE` flags. Chilled strings have the `FL_FREEZE` flag as to minimize the need to check for chilled strings across the codebase, and to improve compatibility with C extensions. Notes: - `String#freeze`: clears the chilled flag. - `String#-@`: acts as if the string was mutable. - `String#+@`: acts as if the string was mutable. - `String#clone`: copies the chilled flag. Co-authored-by: Jean Boussier <[email protected]> |
| |
| This frees FL_USER0 on both T_MODULE and T_CLASS. Note: prior to this, FL_SINGLETON was never set on T_MODULE, so checking for `FL_SINGLETON` without first checking that `FL_TYPE` was `T_CLASS` was valid. That's no longer the case. |
| |
| `test_keyword.rb` caught this issue. Just need to run with `threshold=1` |
| Usually we deal with splats by speculating that they're of a specific size. In this case, the C method takes a pointer and a length, so we can support changing sizes just fine. |
| This is designed to replace the newarraykwsplat instruction, which is no longer used in the parse.y compiler after this commit. This avoids an unnecessary array allocation in the case where ARGSCAT is followed by LIST with keyword: ```ruby a = [] kw = {} [*a, 1, **kw] ``` Previous Instructions: ``` 0000 newarray 0 ( 1)[Li] 0002 setlocal_WC_0 a@0 0004 newhash 0 ( 2)[Li] 0006 setlocal_WC_0 kw@1 0008 getlocal_WC_0 a@0 ( 3)[Li] 0010 splatarray true 0012 putobject_INT2FIX_1_ 0013 putspecialobject 1 0015 newhash 0 0017 getlocal_WC_0 kw@1 0019 opt_send_without_block <calldata!mid:core#hash_merge_kwd, argc:2, ARGS_SIMPLE> 0021 newarraykwsplat 2 0023 concattoarray 0024 leave ``` New Instructions: ``` 0000 newarray 0 ( 1)[Li] 0002 setlocal_WC_0 a@0 0004 newhash 0 ( 2)[Li] 0006 setlocal_WC_0 kw@1 0008 getlocal_WC_0 a@0 ( 3)[Li] 0010 splatarray true 0012 putobject_INT2FIX_1_ 0013 pushtoarray 1 0015 putspecialobject 1 0017 newhash 0 0019 getlocal_WC_0 kw@1 0021 opt_send_without_block <calldata!mid:core#hash_merge_kwd, argc:2, ARGS_SIMPLE> 0023 pushtoarraykwsplat 0024 leave ``` pushtoarraykwsplat is designed to be simpler than newarraykwsplat. It does not take a variable number of arguments from the stack, it pops the top of the stack, and appends it to the second from the top, unless the top of the stack is an empty hash. During this work, I found the ARGSPUSH followed by HASH with keyword did not compile correctly, as it pushed the generated hash to the array even if the hash was empty. This fixes the behavior, to use pushtoarraykwsplat instead of pushtoarray in that case: ```ruby a = [] kw = {} [*a, **kw] [{}] # Before [] # After ``` This does not remove the newarraykwsplat instruction, as it is still referenced in the prism compiler (which should be updated similar to this), YJIT (only in the bindings, it does not appear to be implemented), and RJIT (in a couple comments). After those are updated, the newarraykwsplat instruction should be removed. |
| This is the same optimization as e4272fd29 ("Avoid allocation when passing no keywords to anonymous kwrest methods") but for YJIT. For anonymous kwrest parameters, nil is just as good as an empty hash. On the usage side, update `splatkw` to handle `nil` with a leaner path. |
| |
| * Specialize String#byteslice(a, b) This adds a specialization for String#byteslice when there are two parameters. This makes our protobuf parser go from 5.84x slower to 5.33x slower ``` Comparison: decode upstream (53738 bytes): 7228.5 i/s decode protobuff (53738 bytes): 1236.8 i/s - 5.84x slower Comparison: decode upstream (53738 bytes): 7024.8 i/s decode protobuff (53738 bytes): 1318.5 i/s - 5.33x slower ``` * Update yjit/src/codegen.rs --------- Co-authored-by: Maxime Chevalier-Boisvert <[email protected]> |
| Now that `...` uses `**kwrest` instead of regular splat and ruby2keywords, we need to support these type of methods to support `...` well. |
| |
| * YJIT: Add codegen for Float arithmetics * Add Flonum and Fixnum tests |
| |
| |
| This instruction is similar to concattoarray, but it takes the number of arguments to push to the array, removes that number of arguments from the stack, and adds them to the array now at the top of the stack. This allows `f(*a, 1)` to allocate only a single array on the caller side (which can be reused on the callee side in the case of `def f(*a)`). Prior to this commit, `f(*a, 1)` would generate 3 arrays: * a dupped by splatarray true * 1 wrapped in array by newarray * a dupped again by concatarray Instructions Before for `a = []; f(*a, 1)`: ``` 0000 newarray 0 ( 1)[Li] 0002 setlocal_WC_0 a@0 0004 putself 0005 getlocal_WC_0 a@0 0007 splatarray true 0009 putobject_INT2FIX_1_ 0010 newarray 1 0012 concatarray 0013 opt_send_without_block <calldata!mid:f, argc:1, ARGS_SPLAT|FCALL> 0015 leave ``` Instructions After for `a = []; f(*a, 1)`: ``` 0000 newarray 0 ( 1)[Li] 0002 setlocal_WC_0 a@0 0004 putself 0005 getlocal_WC_0 a@0 0007 splatarray true 0009 putobject_INT2FIX_1_ 0010 pushtoarray 1 0012 opt_send_without_block <calldata!mid:f, argc:1, ARGS_SPLAT|ARGS_SPLAT_MUT|FCALL> 0014 leave ``` With these changes, method calls to Ruby methods should implicitly allocate at most one array. Ignore typeprof bundled gem failure due to unrecognized instruction. |
| This flag is set when the caller has already created a new array to handle a splat, such as for `f(*a, b)` and `f(*a, *b)`. Previously, if `f` was defined as `def f(*a)`, these calls would create an extra array on the callee side, instead of using the new array created by the caller. This modifies `setup_args_core` to set the flag whenver it would add a `splatarray true` instruction. However, when `splatarray true` is changed to `splatarray false` in the peephole optimizer, to avoid unnecessary allocations on the caller side, the flag must be removed. Add `optimize_args_splat_no_copy` and have the peephole optimizer call that. This significantly simplifies the related peephole optimizer code. On the callee side, in `setup_parameters_complex`, set `args->rest_dupped` to true if the flag is set. This takes a similar approach for optimizing regular splats that was previiously used for keyword splats in d2c41b1bff1f3102544bb0d03d4e82356d034d33 (via VM_CALL_KW_SPLAT_MUT). |
| For receiver with a singleton class, there are multiple vectors YJIT can end up retaining the object. There is a path in jit_guard_known_klass() that bakes the receiver into the code, and the object could also be kept alive indirectly through a path starting at the CME object baked into the code. To avoid these s, avoid compiling calls on objects with a singleton class. See: https://.com/Shopify/ruby/issues/552 [Bug #20209] |
| YJIT didn't guard for ruby2_keywords hash in case of splat calls that land in methods with a rest parameter, creating incorrect results. The compile-time checks didn't correspond to any actual effects of ruby2_keywords, so it was masking this bug and YJIT was needlessly refusing to compile some code. About 16% of fallback reasons in `lobsters` was due to the ISeq check. We already handle the tagging part with exit_if_supplying_kw_and_has_no_kw() and should now have a dynamic guard for all splat cases. Note for backporting: You also need 7f51959ff1. [Bug #20195] |
| * YJIT: Allow inlining ISEQ calls with a block * Leave a TODO comment about u16 inline_block |
| * YJIT: Optimize defined?(yield) * Remove an irrelevant comment * s/get/gen/ |
| to BUILTIN_ATTR_SINGLE_NOARG_LEAF The attribute was created when the other attribute was called BUILTIN_ATTR_INLINE. Now that the original attribute is renamed to BUILTIN_ATTR_LEAF, it's only confusing that we call it "_INLINE". |
| The thing that has used this in the past was very buggy, and we've never revisied it. Let's remove it until we need it again. |
| ``` warning: unused import: `condition::Condition` --> src/asm/arm64/arg/mod.rs:13:9 | 13 | pub use condition::Condition; | ^^^^^^^^^^^^^^^^^^^^ | = note: `#[warn(unused_imports)]` on by default warning: unused import: `rb_yjit_fix_mul_fix as rb_fix_mul_fix` --> src/cruby.rs:188:9 | 188 | pub use rb_yjit_fix_mul_fix as rb_fix_mul_fix; | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ warning: unused import: `rb_insn_len as raw_insn_len` --> src/cruby.rs:142:9 | 142 | pub use rb_insn_len as raw_insn_len; | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | = note: `#[warn(unused_imports)]` on by default ``` Make asm public so it stops warning about unused public stuff in there. |
| Previously, block.to_proc was called first, by vm_caller_setup_arg_block. kw.to_hash was called later inside CALLER_SETUP_ARG or setup_parameters_complex. This adds a splatkw instruction that is inserted before sends with ARGS_BLOCKARG and KW_SPLAT and without KW_SPLAT_MUT. This is not needed in the KW_SPLAT_MUT case, because then you know the value is a hash, and you don't need to call to_hash on it. The splatkw instruction checks whether the second to top block is a hash, and if not, replaces it with the value of calling to_hash on it (using rb_to_hash_type). As it is always before a send with ARGS_BLOCKARG and KW_SPLAT, second to top is the keyword splat, and top is the passed block. |
| Too complex classes use a hash table to store ivs, and should always pin their IVs. We shouldn't touch those classes in compaction. |
| Right now the `rb_shape_get_next` shape caller need to first check if there is capacity left, and if not call `rb_shape_transition_shape_capa` before it can call `rb_shape_get_next`. And on each of these it needs to checks if we got a TOO_COMPLEX back. All this logic is duplicated in the interpreter, YJIT and RJIT. Instead we can have `rb_shape_get_next` do the capacity transition when needed. The caller can compare the old and new shapes capacity to know if resizing is needed. It also can check for TOO_COMPLEX only once. |
| |
| This reverts commit e3afc212ec059525fe4e5387b2a3be920ffe0f0e. |
| Previously, the version-controlled `cruby_bindings.inc.rs` file contained the build-time artifact `id.h`, which nobu mentioned hinders the goal of having fewer magic numbers in the repository. Lookup the IDs YJIT needs on boot. It costs cycles, but it's fine since YJIT only uses a handful of IDs at the moment. No perceptible degradation to boot time found in my testing. |
| Given `SHAPE_MAX_NUM_IVS 80`, we transition to TOO_COMPLEX way before we could overflow a 8bit counter. This reduce the size of `rb_shape_t` from 32B to 24B. If we decide to raise `SHAPE_MAX_NUM_IVS` we can always increase that type again. |
| This way the groth factor is encapsulated, which allows rb_shape_transition_shape_capa to be smarter about ideal sizes. |
| |
| Previously, TestStack#test_machine_stack_size failed pretty consistently on ARM64 macOS, with Rust code and part of the interpreter used for per-instruction fallback (rb_vm_invokeblock() and friends) touching the stack guard page and crashing with SEGV. I've also seen the same test fail on x64 Linux, though with a different symptom. Notes: Merged: https://.com/ruby/ruby/pull/8443 Merged-By: XrXr |
| * YJIT: implement side chain fallback for setlocal to avoid exiting * Update yjit/src/codegen.rs Co-authored-by: Takashi Kokubun <[email protected]> --------- Co-authored-by: Takashi Kokubun <[email protected]> Notes: Merged-By: maximecb <[email protected]> |
| Notes: Merged-By: maximecb <[email protected]> |
| |
| * YJIT: Count throw instructions for each tag * Show % of each throw type Notes: Merged-By: k0kubun <[email protected]> |
| Co-authored-by: Maxime Chevalier-Boisvert <[email protected]> Notes: Merged-By: k0kubun <[email protected]> |
| Notes: Merged-By: maximecb <[email protected]> |
| Notes: Merged: https://.com/ruby/ruby/pull/8124 |
| Notes: Merged-By: k0kubun <[email protected]> |