From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on starla X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_PASS,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 Received: from nue.mailmanlists.eu (nue.mailmanlists.eu [94.130.110.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 9044B1F47A for ; Sat, 31 Aug 2024 18:05:29 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (1024-bit key; unprotected) header.d=ml.ruby-lang.org header.i=@ml.ruby-lang.org header.a=rsa-sha256 header.s=mail header.b=OvpOVNU0; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=ITQU5lOr; dkim-atps=neutral DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ml.ruby-lang.org; s=mail; t=1725127526; bh=MgJd72P0CEOUSChdUdLhrNHlz3LI3VAdKpztH1cZSQM=; h=Date:References:To:Reply-To:Subject:List-Id:List-Archive: List-Help:List-Owner:List-Post:List-Subscribe:List-Unsubscribe: From:Cc:From; b=OvpOVNU01y19RNxVQoXuWVrTEW8bannV/CAaZtdyLXcJDSO0o9Kfgi1YDFnMCzABP O2VBH26kmrGoSReTl/JQ4j1BbFkZLg2qt02SPBVyS5A7RO28M7NtpB5p7zoDipbO16 xWS7aa/xydeNtneclkHwBwrSYeI0M2S+d1syzu/A= Received: from nue.mailmanlists.eu (localhost [IPv6:::1]) by nue.mailmanlists.eu (Postfix) with ESMTP id A139943D03 for ; Sat, 31 Aug 2024 18:05:26 +0000 (UTC) Authentication-Results: nue.mailmanlists.eu; dkim=pass (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=ITQU5lOr; dkim-atps=neutral Received: from s.wfbtzhsv.outbound-mail.sendgrid.net (s.wfbtzhsv.outbound-mail.sendgrid.net [159.183.224.104]) by nue.mailmanlists.eu (Postfix) with ESMTPS id 1946343CA2 for ; Sat, 31 Aug 2024 18:05:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ruby-lang.org; h=from:references:subject:mime-version:content-type: content-transfer-encoding:list-id:to:cc:content-type:from:subject:to; s=s1; bh=loMAlEyOBzk2B6eZ7gMVnwixS+QwnBLr3kAfO6wliVU=; b=ITQU5lOrkmuScYaVQMQtkoT1EIBOpy1F67DHwhc9YsZa3bFYsgLjL1bxZ57DfO9cUzaj PrWUfdF0Le9aJzUMuSzuFZ4vdsyK7jT5Jr5wgQEoVq6ewJ8qKVQd4dFzr0Ja//DqdQ6EcV glcXZ9RjjQ+1VKHCFsFJMfIBFPToEa+/YiPXIfaPSoKYCDV2aDgSaqrAgBr5YrIFlAS8tn O4dsJP58Dj8PRZvqRhzsooY9ZDxFMA7v4PR/J5/tSBtzVx9GUpr9fFJZ+6vyxyLD3c0XOn l3KxXRgsvfHnkdVljzNxu/rjwc85mTL/I6skrjV4rN+8yce0jJ08d+rewmbR2lDg== Received: by recvd-6c5668c69-7jbgl with SMTP id recvd-6c5668c69-7jbgl-1-66D35B59-28 2024-08-31 18:05:13.807162401 +0000 UTC m=+849114.431678443 Received: from herokuapp.com (unknown) by geopod-ismtpd-24 (SG) with ESMTP id E-8LDNWpRLWAL82P_0w27A for ; Sat, 31 Aug 2024 18:05:13.756 +0000 (UTC) Date: Sat, 31 Aug 2024 18:05:13 +0000 (UTC) Message-ID: References: Mime-Version: 1.0 X-Redmine-Project: ruby-master X-Redmine-Issue-Tracker: Feature X-Redmine-Issue-Id: 20709 X-Redmine-Issue-Author: zack.ref@gmail.com X-Redmine-Issue-Priority: Normal X-Redmine-Sender: zack.ref@gmail.com X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-Redmine-MailingListIntegration-Message-Ids: 95634 X-SG-EID: =?us-ascii?Q?u001=2EJZWIpMPddRf8N2BxZz+XHwNQLO3r0ERoQQgr+KlfaAb4nv0lOqLYRPMVs?= =?us-ascii?Q?Se7US5DNst6eJwRvaIV4=2Fu54GW82fZdpBVO+DRo?= =?us-ascii?Q?gUb2SJfL46gbozUxsVh6tMoZSefOfR2LLcfEbOW?= =?us-ascii?Q?oWdZr3MDF1Z6mq8Xijua+okrEjXHVZQGZ0lnnbG?= =?us-ascii?Q?fydiW4iCDu94o95nnN3JkvrCJTU7Ixr4PYib50k?= =?us-ascii?Q?suCn+PXHQZ=2FlR5zkNLSeFPFcwDMPSYSnUGCRHsP?= =?us-ascii?Q?2InrxLIhcD+jk8WNJuWVoxMSWg=3D=3D?= To: ruby-core@ml.ruby-lang.org X-Entity-ID: u001.I8uzylDtAfgbeCOeLBYDww== Message-ID-Hash: ZKYXDCVAOXG5BV4AGFGUPQKRMJ76TNGN X-Message-ID-Hash: ZKYXDCVAOXG5BV4AGFGUPQKRMJ76TNGN X-MailFrom: bounces+313651-b711-ruby-core=ml.ruby-lang.org@em5188.ruby-lang.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.9 Precedence: list Reply-To: Ruby developers Subject: [ruby-core:118998] [Ruby master Feature#20709] Improve String#rindex performance on OSX List-Id: Ruby developers Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: "zack.ref@gmail.com (Zack Deveau) via ruby-core" Cc: "zack.ref@gmail.com (Zack Deveau)" Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Issue #20709 has been reported by zack.ref@gmail.com (Zack Deveau). ---------------------------------------- Feature #20709: Improve String#rindex performance on OSX https://bugs.ruby-lang.org/issues/20709 * Author: zack.ref@gmail.com (Zack Deveau) * Status: Open ---------------------------------------- `String#rindex` is much slower on OSX than on Linux hosts. This appears due to the lack of a `memrchr` implementation on OSX. The [fallback solution](https://github.com/ruby/ruby/blob/e69945fc57efba5c8895c21235e109145865952d/string.c#L4348-L4402) currently performs a `memcmp` on each character in the search string looking for a substring match. ```c while (s) { if (memcmp(s, t, slen) == 0) { return s - sbeg; } if (s <= sbeg) break; s = rb_enc_prev_char(sbeg, s, e, enc); } ``` ``` ruby-master | 10 char substring (long) near the end of a search string | osx search string enc: UTF-8 search string size: 8010 user system total real index (long substring) 0.000075 0.000000 0.000075 ( 0.000074) rindex 0.000445 0.000000 0.000445 ( 0.000448) ``` Proposed improvement is available on this pull request: This patch proposes using an implementation of `memrchr` made for the interpreter on hosts that lack a native implementation. Below are benchmark results which demonstrate that this approach either improves or maintains performance of `String#rindex` for all cases we could come up with. ```c static void* rb_memrchr(const char *search_str, int chr, long search_len) { const char *ptr = search_str + search_len; do { if (*(--ptr) == chr) return (void *)ptr; } while (ptr >= search_str); return ((void *)0); } ``` Special thanks to @jhawthorn who provided some helpful feedback on this solution during development! # Benchmarks Each of the following were made with a variant of the following general purpose benchmark. Note that `String#index` speed can be [impacted by the size of the provided substring](https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274) (Denoted as long/short substring in the results, assume long substring when none is specified). ``` require "benchmark" substring = "98279827AA" the_string = "" 800.times{ the_string.concat(('a'..'z').to_a.shuffle[0,8].join)} the_string.concat(substring) 200.times{ the_string.concat(('a'..'z').to_a.shuffle[0,8].join)} puts the_string.encoding puts the_string.size Benchmark.bm do |x| x.report(:index) { 100.times { the_string.index(substring) } } x.report(:rindex) { 100.times { the_string.rindex(substring) } } end ``` ``` ruby-master | substring near the end | osx https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274 UTF-8 user system total real index 0.000075 0.000000 0.000075 ( 0.000074) rindex 0.000445 0.000000 0.000445 ( 0.000448) ------------------------------------------------------ ruby-patched | substring near the end | osx https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274 UTF-8 user system total real index (short substring) 0.000400 0.000001 0.000401 ( 0.000399) index (long substring) 0.000080 0.000000 0.000080 ( 0.000080) rindex 0.000057 0.000001 0.000058 ( 0.000057) ``` ``` ruby-master | substring in the middle | osx UTF-8 user system total real index 0.000031 0.000000 0.000031 ( 0.000030) rindex 0.001086 0.000002 0.001088 ( 0.001088) ------------------------------------------------------ ruby-patched | substring in the middle | osx UTF-8 user system total real index 0.000075 0.000000 0.000075 ( 0.000075) rindex 0.000130 0.000000 0.000130 ( 0.000133) ``` ``` ruby-master | substring in the middle with a short substring | osx https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274 UTF-8 user system total real index (short substring) 0.000243 0.000004 0.000247 ( 0.000246) rindex 0.001176 0.000005 0.001181 ( 0.001182) ------------------------------------------------------ ruby-patched | substring in the middle with a short substring | osx https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274 UTF-8 user system total real index (short substring) 0.000255 0.000005 0.000260 ( 0.000259) rindex 0.000130 0.000000 0.000130 ( 0.000133) ``` ``` ruby-master | substring at the end | osx UTF-8 user system total real index 0.000049 0.000001 0.000050 ( 0.000048) rindex 0.000008 0.000001 0.000009 ( 0.000007) ------------------------------------------------------ ruby-patched | substring at the end | osx UTF-8 user system total real index 0.000075 0.000000 0.000075 ( 0.000074) rindex 0.000006 0.000000 0.000006 ( 0.000006) ``` ``` ruby-master | substring at the start | osx UTF-8 user system total real index 0.000008 0.000001 0.000009 ( 0.000007) rindex 0.002444 0.000008 0.002452 ( 0.002452) ------------------------------------------------------ ruby-patched | substring at the start | osx UTF-8 user system total real index 0.000015 0.000000 0.000015 ( 0.000015) rindex 0.000280 0.000001 0.000281 ( 0.000280) ``` -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/