Loading...
Searching...
No Matches
RUBY is a callenging story
Date
30 oct 2024, 21:36 - setup of the initial document

Ruby is, together with TCL, Python, Perl and PHP, a SECOND generation language, followed by Java and C#, which I call THIRD generation languages.

‍The SECOND generation languages all have at least one design bug that leads to increasingly tragic consequences over the years.

  1. With TCL, it was the lack of OO implementation that is explicitly NOT in the language core.
  2. With Python, it was the lack of encapsulation which resulted in unusable THREAD support.
  3. With Ruby, it was the lack of parallelism and the attempt to implement the long-jump-on-error along with aggressive garbage-collection-without-ref-counting.

PERFORMANCE

‍The performance of Ruby in the executed code is good overall, although the biggest weakness by far is the slow startup, which in turn, together with the lack of parallelism in the execution, has consequences for a server application.

The performance is measured with the performance-test-suite from theLink.

RUBY

setup

  • release and aggressive have --spawn and --fork as startup.
  • release is built with shared libraries and aggressive is built with static libraries.
  • thread support is not used.

--send-X

  • the --send-XXX speed is fast, close to C and C++, faster than Python.
  • the --send-nothing is 5% slower than Python, this is cost of the Ruby specific overhead for long-jump protection on error.

--parent

  • interpreter startup is very slow, Ruby ist not really usefull in spawn setup.
  • The useable --fork performance is a real advantage for Ruby compared to Python.

PROBLEMS

BUG: object allocation during garbage collection phase

In my specific case, the Garbage-Collection (GC) had a problem with a programming error in the constructor of a callback or the unintentional deletion of a local variable.

ANALYSIS

First of all, the summary:

Every Ruby C-API function basically has the problem that there is NO error return, but in the event of an error a long jump is carried out to a remote address. To avoid this, a C->RUBY callback is not called directly, but via an API function called rb_protect, which is de facto a try..catch.

In the event of an error, the error message is catched and passed to the Programming-Language-Micro-Kernel (PLMK), which then takes over further processing or forwarding, so far so good.

Because there seems to be a problem with the selection of the objects, which affects the runtime, for example, there has been an upgrade in the current 3.3.5 release of Ruby.

### GC / Memory management

* Major performance improvements over Ruby 3.2
    * Young objects referenced by old objects are no longer immediately
      promoted to the old generation. This significantly reduces the frequency of
      major GC collections. [[Feature #19678]]
    * A new `REMEMBERED_WB_UNPROTECTED_OBJECTS_LIMIT_RATIO` tuning variable was
      introduced to control the number of unprotected objects cause a major GC
      collection to trigger. The default is set to `0.01` (1%). This significantly
      reduces the frequency of major GC collection. [[Feature #19571]]
    * Write Barriers were implemented for many core types that were missing them,
      notably `Time`, `Enumerator`, `MatchData`, `Method`, `File::Stat`, `BigDecimal`
      and several others. This significantly reduces minor GC collection time and major
      GC collection frequency.
    * Most core classes are now using Variable Width Allocation, notably `Hash`, `Time`,
      `Thread::Backtrace`, `Thread::Backtrace::Location`, `File::Stat`, `Method`.
      This makes these classes faster to allocate and free, use less memory and reduce
      heap fragmentation.
* `defined?(@ivar)` is optimized with Object Shapes.

But now comes the bombshell: Ruby explicitly does not allow a new object to be created during the Garbage-Collection (GC), which then immediately ends in a:

rb_bug("object allocation during garbage collection phase");

This restriction in itself raises doubts, but it gets even better. This restriction applies not only to any code, but also to the ruby-kernel itself, so that in principle every ruby API call could end with a CORE.

  • Now you have to know that an object is not just a larger data structure, but also a completely normal temporary* variable is an object.

In Programming-Language-Micro-Kernel (PLMK), when an object is deleted, the destructor is called and this destructor does cleanup, which then de facto has consequences.
One of the consequences is that in RPC mode a delete-message is sent from the server to the client because it is quite common that when one object is deleted, all other dependent objects are also deleted.

  • Think of a database handle, for example.

Sending such a delete message is a task and this task is code that runs within the Garbage-Collection (GC) → funny.

Message from the RUBY 3.3.5 kernel
NHI1_HOME/example/rb/LibLcConfigRpcServer.rb:939: [BUG] object allocation during garbage collection phase
ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [x86_64-linux-gnu]

-- Control frame information -----------------------------------------------
c:0006 p:0003 s:0029 e:000027 METHOD NHI1_HOME/example/rb/LibLcConfigRpcServer.rb:939 [FINISH]
c:0005 p:---- s:0023 e:000022 CFUNC  :WriteString
c:0004 p:0011 s:0019 e:000018 METHOD NHI1_HOME/example/rb/LibLcConfigRpcServer.rb:313 [FINISH]
c:0003 p:---- s:0013 e:000012 CFUNC  :ProcessEvent
c:0002 p:0038 s:0008 E:000340 EVAL   NHI1_HOME/example/rb/LcConfigServer.rb:72 [FINISH]
c:0001 p:0000 s:0003 E:0012a0 DUMMY  [FINISH]

-- Ruby level backtrace information ----------------------------------------
NHI1_HOME/example/rb/LcConfigServer.rb:72:in `<main>'
NHI1_HOME/example/rb/LcConfigServer.rb:72:in `ProcessEvent'
NHI1_HOME/example/rb/LibLcConfigRpcServer.rb:313:in `LcConfigWriteString'
NHI1_HOME/example/rb/LibLcConfigRpcServer.rb:313:in `WriteString'               <<< here RUBY decides to run the GC
NHI1_HOME/example/rb/LibLcConfigRpcServer.rb:939:in `ObjectDeleteCall'          <<< this is the function to sync the client

-- Threading information ---------------------------------------------------
Total ractor count: 1
Ruby thread count for this ractor: 1

-- C level backtrace information -------------------------------------------
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(rb_print_backtrace+0x14) [0x7f32cd75471b] vm_dump.c:820
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(rb_vm_bugreport) vm_dump.c:1151
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(bug_report_end+0x0) [0x7f32cd54aa3a] error.c:1042
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(rb_bug_without_die) error.c:1042
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(die+0x0) [0x7f32cd491a01] error.c:1050
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(rb_bug) error.c:1052

// ^^^ this is the BUG: rb_bug("object allocation during garbage collection phase");

NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(gc_event_hook_body+0x0) [0x7f32cd492626] gc.c:2867
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(newobj_slowpath) gc.c:2880
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(newobj_slowpath_wb_protected) gc.c:2895
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(newobj_of0+0x6c) [0x7f32cd578904] gc.c:2937
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(newobj_of) gc.c:2947
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(rb_wb_protected_newobj_of) gc.c:2962
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(ec_str_alloc_embed+0x15) [0x7f32cd6bbfc5] string.c:1695
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(ec_str_duplicate) string.c:1760

// ^^^ ruby-kernel try to create a NEW object "ec_str_duplicate"

NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(rb_ec_str_resurrect) string.c:1812
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(vm_exec_core+0xbca) [0x7f32cd738e3a] insns.def:378
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(vm_exec_loop+0xa) [0x7f32cd73e2a9] vm.c:2513
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(rb_vm_exec) vm.c:2489
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(rb_vm_call_kw+0x137) [0x7f32cd7474c7] vm_eval.c:110
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(rb_method_call_with_block_kw+0x7e) [0x7f32cd636d8e] proc.c:2459
NHI1_BUILD/x86_64-suse-linux-gnu/debug2/theKernel/rb/.libs/librbmkkernel.so.22(rb_mkkernel_sCallMethodWithOne+0x1f) [0x7f32b1c6f76d] NHI1_HOME/theKernel/rb/MkCall_rb.c:41
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(rb_protect+0xe7) [0x7f32cd554637] eval.c:983
NHI1_BUILD/x86_64-suse-linux-gnu/debug2/theKernel/rb/.libs/librbmkkernel.so.22(rb_mkkernel_sRescue+0x19) [0x7f32b1c6f791] NHI1_HOME/theKernel/rb/MkCall_rb.c:29
NHI1_BUILD/x86_64-suse-linux-gnu/debug2/theKernel/rb/.libs/librbmkkernel.so.22(rb_mkkernel_ObjectDeleteCall+0xac) [0x7f32b1c6fc18] NHI1_HOME/theKernel/rb/MkCall_rb.c:192

// ^^^ this is my 'ObjectDeleteCall' to sync the server with the client

NHI1_BUILD/x86_64-suse-linux-gnu/debug2/theKernel/c/.libs/libmkkernel.so.22(MkObjectDeleteCall_RT+0x60) [0x7f32b1c0d24f] NHI1_HOME/theKernel/c/MkObjectS_mk.c:1220
NHI1_BUILD/x86_64-suse-linux-gnu/debug2/theKernel/c/.libs/libmkkernel.so.22(MkRefDecrWithoutSelf_RT+0x83) [0x7f32b1c0ee11] NHI1_HOME/theKernel/c/MkObjectS_mk.c:270
NHI1_BUILD/x86_64-suse-linux-gnu/debug2/theKernel/rb/.libs/librbmkkernel.so.22(rb_mkkernel_AtomDeleteSoft+0x1c) [0x7f32b1c74254] NHI1_HOME/theKernel/rb/MkObjectC_rb.c:236
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(RTYPEDDATA_TYPE+0x0) [0x7f32cd56ce53] gc.c:3500
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(rb_data_free) gc.c:3501
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(obj_free) gc.c:3659
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(gc_sweep_plane+0x45) [0x7f32cd56d9d5] gc.c:5681
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(gc_sweep_page) gc.c:5759
NHI1_EXT/x86_64-suse-linux-gnu/debug2/lib64/libruby.so.3.3(gc_sweep_step) gc.c:6048
...

// ^^^ the garbage-collector does his "job"

SOLUTION

Ruby has a general problem with Garbage-Collection (GC).

It seems to me that the Garbage-Collection (GC) approach is too ambitious, so that, for example, the C programmer has little influence on protecting his data. I have found two methods to be useful for limiting the problem, but that doesn't say whether the problem just continues to exist "hidden".

FIRST - protect a heap(global) VALUE:

  1. Protect every Global-Memory (HEAP) accessible VALUE with:
    • rb_gc_register_address(ADDRESS).
  2. Unprotect every Global-Memory (HEAP) accessible VALUE from [1] with:
    • rb_gc_unregister_address(ADDRESS).
  3. The ADDRESS is important here because it has to be a Global-Memory (HEAP) address.

Example: protect a VALUE stored in a C struct If a struct is created use:

struct myStruct {
VALUE myVal;
..
}
struct myStruct *myGlobalVar = (struct myStruct*) malloc(sizeof(struct myStruct));
myGlobalVar->myVal = someVALUE;
rb_gc_register_address(&myGlobalVar->myVal); // important, use the GLOBAL address

If a struct is destroyed use:

rb_gc_unregister_address(&myGlobalVar->myVal); // important, use the GLOBAL address
free(myGlobalVar);
SECOND - protect a new VALUE:

Protecting a Global-Memory (HEAP) accessible VALUE helped a lot but it was not enough to provide complete protection under ruby-3.3.5.

Additionally, each new TypedData_Wrap_Struct VALUE had to be protected with rb_gc_register_mark_object(VALUE).

Example: protect a new VALUE from TypedData_Wrap_Struct used in Ruby - Programming-Language-Micro-Kernel (PLMK)

mk_inline VALUE MK(AtomObjCrt) (OT_LNG_CLASS_T clazz, MK_PTR type, MK_MNG mng, int objc, VALUE* objv)
{
OT_LNG_T self = TypedData_Wrap_Struct (clazz, type, mng);
rb_obj_call_init(self, objc, objv);
rb_gc_register_mark_object(self);
return self;
}
#define mk_inline
MK_PTRB * MK_PTR
MK_PTRB * MK_MNG