From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on starla X-Spam-Level: X-Spam-Status: No, score=0.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_BL_SPAMCOP_NET,SPF_HELO_PASS, SPF_PASS autolearn=no autolearn_force=no version=3.4.6 Received: from nue.mailmanlists.eu (nue.mailmanlists.eu [94.130.110.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 4F8AF1F4C6 for ; Fri, 8 Nov 2024 08:57:38 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (1024-bit key; unprotected) header.d=ml.ruby-lang.org header.i=@ml.ruby-lang.org header.a=rsa-sha256 header.s=mail header.b=1uFeAa/+; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=Fm2hjvDA; dkim-atps=neutral DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ml.ruby-lang.org; s=mail; t=1731056225; bh=AVhQJVAZYLntzb2i78fhIhYkFywj96ZD4zICKg5SAA8=; h=Date:References:To:Reply-To:Subject:List-Id:List-Archive: List-Help:List-Owner:List-Post:List-Subscribe:List-Unsubscribe: From:Cc:From; b=1uFeAa/+KcXoDQkSsvKdtVbsLVGWh0Fd6WPNrSoz9YF3x8SkkuWpiek8fOKmBaQXD lECQaPNKJxXGrUu6v8LiBf2VexMh3ISn2Z6LICRnbsXUjAoudMutyc4tmL+bBiJWRc 4E3CaCr6Aq4cjZMRHmcXuT4Qy8BNLsC9ZkW9KiN0= Received: from nue.mailmanlists.eu (localhost [IPv6:::1]) by nue.mailmanlists.eu (Postfix) with ESMTP id 4BBD244B60 for ; Fri, 8 Nov 2024 08:57:05 +0000 (UTC) Authentication-Results: nue.mailmanlists.eu; dkim=pass (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=Fm2hjvDA; dkim-atps=neutral Received: from s.wrqvtvvn.outbound-mail.sendgrid.net (s.wrqvtvvn.outbound-mail.sendgrid.net [149.72.120.130]) by nue.mailmanlists.eu (Postfix) with ESMTPS id 4CAB844080 for ; Fri, 8 Nov 2024 08:56:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ruby-lang.org; h=from:references:subject:mime-version:content-type: content-transfer-encoding:list-id:to:cc:content-type:from:subject:to; s=s1; bh=uRGPvwMWQwlRzYID8AL4BM0sAlGHA0+O6nWaKZ/dDuM=; b=Fm2hjvDAU78rK16sHE7bVc0R2nal8TaMznWZDNHwanJ2CYV4of9Qo0zYR8FJLCr0VXs1 VGXmRagbeeiMS7HiraXzVP6PTW8I1sFb79REaLGaAy39L1ZVp4YXoRwmG9/U1Dzs5TznYg ArgxtIhDZ5L99mJSbp4Hq32hi06XGVw5/8K70zZH9H2kZLG0oobJCsS+mLe8XF3kbyU3wV 6tqIudFOB/14M4oQBWmN81/ycUOPdlSuDbbHiNRFaAVV7Y6BDBXIfsKuF9bck6R0K2v6GF 8CsCiopKV7QuY2Oy2DDwVRbMU7/kKNrx26E3/XA7Oz1aoPIIkgBJREpwQQv1EJTg== Received: by recvd-7cc7f7d978-8l8n7 with SMTP id recvd-7cc7f7d978-8l8n7-1-672DD254-7 2024-11-08 08:56:52.283278602 +0000 UTC m=+4892340.366707354 Received: from herokuapp.com (unknown) by geopod-ismtpd-28 (SG) with ESMTP id BVEYOHRxSzGYGfnu8scrcQ for ; Fri, 08 Nov 2024 08:56:52.175 +0000 (UTC) Date: Fri, 08 Nov 2024 08:56:52 +0000 (UTC) Message-ID: References: Mime-Version: 1.0 X-Redmine-Project: ruby-master X-Redmine-Issue-Tracker: Feature X-Redmine-Issue-Id: 20878 X-Redmine-Issue-Author: byroot X-Redmine-Issue-Priority: Normal X-Redmine-Sender: rhenium X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-Redmine-MailingListIntegration-Message-Ids: 96503 X-SG-EID: =?us-ascii?Q?u001=2E4VKbTyh=2FCkQevznWjUKOV5x0c01nrpYz2ejDo9GTdJrhUaEMtnklw1tg0?= =?us-ascii?Q?aCn8vvWz0mJinNnAMiRUYgVwpXFQ8uxnTZ7bwxT?= =?us-ascii?Q?BwMWHca3jW3Inx=2FwdvttdND1183iEPrFCoomFxk?= =?us-ascii?Q?aL9hZGXhrJMN=2F7m8TimxZ=2FrZDBTZ5CgphTkobbx?= =?us-ascii?Q?1JQWQVsKVI77lsYLemvi0KMh6PvjdnCkpa2Q8S9?= =?us-ascii?Q?XLsZrDBY2toX6rcmqm7wMMVTeq1zO8VblV4zEt6?= =?us-ascii?Q?kNz2psaRfccnP281e6u4AGQhPQ=3D=3D?= To: ruby-core@ml.ruby-lang.org X-Entity-ID: u001.I8uzylDtAfgbeCOeLBYDww== Message-ID-Hash: ZO5QRNFMFXLY72MVFBOCYY3IGYV243KP X-Message-ID-Hash: ZO5QRNFMFXLY72MVFBOCYY3IGYV243KP X-MailFrom: bounces+313651-b711-ruby-core=ml.ruby-lang.org@em5188.ruby-lang.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.9 Precedence: list Reply-To: Ruby developers Subject: [ruby-core:119836] [Ruby master Feature#20878] A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` List-Id: Ruby developers Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: "rhenium (Kazuki Yamaguchi) via ruby-core" Cc: "rhenium (Kazuki Yamaguchi)" Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Issue #20878 has been updated by rhenium (Kazuki Yamaguchi). byroot (Jean Boussier) wrote: > #### Work inside RString allocated memory > [...] > The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform > numerous safety checks, compute coderange, and write the string terminator on every invocation. I thought `rb_str_set_len()` was supposed to be the efficient alternative to `rb_str_resize()` meant for such a purpose. I think an assert on the capacity or filling the terminator is cheap enough that it won't matter. That it computes coderange is news to me - I found it was since commit commit:6b66b5fdedb2c9a9ee48e290d57ca7f8d55e01a2 / [Bug #19902] in 2023. I think correcting coderange after directly modifying the RString-managed buffer is the caller's responsibility. Perhaps it could be reversed? ---------------------------------------- Feature #20878: A new C API to create a String by adopting a pointer: `rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc)` https://bugs.ruby-lang.org/issues/20878#change-110524 * Author: byroot (Jean Boussier) * Status: Open ---------------------------------------- ### Context A common use case when writing C extensions is to generate text or bytes into a buffer, and to return it back wrapped into a Ruby String. Examples are `JSON.generate(obj) -> String`, and all other format serializers, compression libraries such as `ZLib.deflate`, etc, but also methods such as `Time.strftime`, ### Current Solution #### Work in a buffer and copy the result The most often used solution is to work with a native buffer and to manage a native allocated buffer, and once the generation is done, call `rb_str_new*` to copy the result inside memory managed by Ruby. It works, but isn't very efficient because it cause an extra copy and an extra `free()`. On `ruby/json` macro-benchmarks, this represent around 5% of the time spent in `JSON.generate`. ```c static void fbuffer_free(FBuffer *fb) { if (fb->ptr && fb->type == FBUFFER_HEAP_ALLOCATED) { ruby_xfree(fb->ptr); } } static VALUE fbuffer_to_s(FBuffer *fb) { VALUE result = rb_utf8_str_new(FBUFFER_PTR(fb), FBUFFER_LEN(fb)); fbuffer_free(fb); return result; } ``` #### Work inside RString allocated memory Another way this is currently done, is to allocate an `RString` using `rb_str_buf_new`, and write into it with various functions such as `rb_str_catf`, or writing past `RString.len` through `RSTRING_PTR` and then resize it with `rb_str_set_len`. The downside with this approach is that it contains a lot of inefficiencies, as `rb_str_set_len` will perform numerous safety checks, compute coderange, and write the string terminator on every invocation. Another major inneficiency is that this API make it hard to be in control of the buffer growth, so it can result in a lot more `realloc()` calls than manually managing the buffer. This method is used by `Kernel#sprintf`, `Time#strftime` etc, and when I attempted to improve `Time#strftime` performance, this problem showed up as the biggest bottleneck: - https://github.com/ruby/ruby/pull/11547 - https://github.com/ruby/ruby/pull/11544 - https://github.com/ruby/ruby/pull/11542 ### Proposed API I think a more effcient way to do this would be to work with a native buffer, and then build a RString that "adopt" the memory region. Technically, you can currently do this by reaching directly into `RString` members, but I don't think it's clean, and a dedicated API would be preferable: ```c /** * Similar to rb_str_new(), but it adopts the pointer instead of copying. * * @param[in] ptr A memory region of `capa` bytes length. MUST have been allocated with `ruby_xmalloc` * @param[in] len Length of the string, in bytes, not including the * terminating NUL character, not including extra capacity. * @param[in] capa The usable length of `ptr`, in bytes, including the * terminating NUL character. * @param[in] enc Encoding of `ptr`. * @exception rb_eArgError `len` is negative. * @return An instance of ::rb_cString, of `len` bytes length, `capa - 1` bytes capacity, * and of `enc` encoding. * @pre At least `capa` bytes of continuous memory region shall be * accessible via `ptr`. * @pre `ptr` MUST have been allocated with `ruby_xmalloc`. * @pre `ptr` MUST not be manually freed after `rb_enc_str_adopt` has been called. * @note `enc` can be a null pointer. It can also be seen as a routine * identical to rb_usascii_str_new() then. */ rb_enc_str_adopt(const char *ptr, long len, long capa, rb_encoding *enc); ``` An alternative to the `adopt` term, could be `move`. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/