Bug #19969
closed
Updated by nobu (Nobuyoshi Nakada) over 1 year ago
May https://.com/nobu/ruby/tree/rehash-after-delete help it?
Updated by nobu (Nobuyoshi Nakada) over 1 year ago
- Backport changed from 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN to 3.0: DONTNEED, 3.1: REQUIRED, 3.2: REQUIRED
Updated by Eregon (Benoit Daloze) over 1 year ago
Right, @nobu's approach seems much better than reintroducing that weird behavior for .dup
.
Ideally we wouldn't rehash as in calling key.hash
methods again, but instead just shrink the internal data structure (and same when growing it).
Updated by Eregon (Benoit Daloze) over 1 year ago
So apparently some applications were relying on Set#dup
/Hash#dup
to do like C++ shrink_to_fit.
Ruby does not have such a method and it feels quite low-level, so it seems better to resize the internal data structure when removing elements/entries and going below some threshold.
Updated by Eregon (Benoit Daloze) over 1 year ago
As a note, this repro code is very "lucky" to trigger a dup
after removing 99.99% of the elements.
I suppose it's done that way to make the effect very clear though.
Without the - [0]
the same problem occurs on 3.0:
$ ruby -v -rset -e 's1 = Set.new(10000.times); s2 = Set.new(9999.times); a=Array.new(10000) { s1 - s2 }; GC.start; puts `ps -o rss= -p #{$$}`.to_i'
ruby 3.0.6p216 (2023-03-30 revision 23a532679b) [x86_64-linux]
3015808
$ ruby -v -rset -e 's1 = Set.new(10000.times); s2 = Set.new(9999.times); a=Array.new(10000) { s1 - s2 - [0] }; GC.start; puts `ps -o rss= -p #{$$}`.to_i'
ruby 3.0.6p216 (2023-03-30 revision 23a532679b) [x86_64-linux]
74552
If a Set is kept alive a long time, one way to ensure it uses the minimum amount of space is Set#reset
, at the cost of extra time to reset/rehash (which notably calls #hash
for every key), it's a time vs memory trade-off, can be worth it for big long-lived sets:
$ ruby -v -rset -e 's1 = Set.new(10000.times); s2 = Set.new(9999.times); a=Array.new(10000) { s=s1 - s2 - [0]; s.reset; s }; GC.start; puts `ps -o rss= -p #{$$}`.to_i'
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]
62992
Automatic shrinking (PR at https://.com/ruby/ruby/pull/8748) should help the worst cases like the repro so that seems good anyway.
Updated by nobu (Nobuyoshi Nakada) over 1 year ago
- Status changed from Open to Closed
Applied in changeset git|9eac9d71786a8dbec520d0541a91149f01adf8ea.
[Bug ] Compact st_table after deleted if possible
Updated by nagachika (Tomoyuki Chikanaga) over 1 year ago
- Backport changed from 3.0: DONTNEED, 3.1: REQUIRED, 3.2: REQUIRED to 3.0: DONTNEED, 3.1: REQUIRED, 3.2: DONE
ruby_3_2 1cc38d5a2f84733e1c2e42548639e2891fe61e69 merged revision(s) 9eac9d71786a8dbec520d0541a91149f01adf8ea.
Updated by hsbt (Hiroshi SHIBATA) over 1 year ago
Thanks nobu and nagachika.
I confirmed to resolve this regrassion with ruby_3_2
branch.
# Before
$ ruby -v -rset -e 's1 = Set.new(10000.times); s2 = Set.new(9999.times); Array.new(10000) { s1 - s2 - [0] }; puts `ps -o rss= -p #{$$}`.to_i'
ruby 3.2.2 (2023-07-05 revision 2f603bc4d7) +YJIT [arm64-darwin23]
4564304
# After
$ ruby -v -rset -e 's1 = Set.new(10000.times); s2 = Set.new(9999.times); Array.new(10000) { s1 - s2 - [0] }; puts `ps -o rss= -p #{$$}`.to_i'
ruby 3.2.2 (2023-11-19 revision d9f4f321c6) +YJIT [arm64-darwin23]
40864
Updated by usa (Usaku NAKAMURA) over 1 year ago
- Backport changed from 3.0: DONTNEED, 3.1: REQUIRED, 3.2: DONE to 3.0: DONTNEED, 3.1: DONE, 3.2: DONE
ruby_3_1 1cae5e7ceaca7304108fdec35d4858a9e4ff7fe0 merged revision(s) 9eac9d71786a8dbec520d0541a91149f01adf8ea.