[PR PATCH] glibc: fix memalign performance regression

Github messages for voidlinux
 help / color / mirror / Atom feed

* [PR PATCH] glibc: fix memalign performance regression
@ 2023-12-26 22:40 tornaria
  2023-12-26 22:52 ` [PR PATCH] [Updated] " tornaria
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: tornaria @ 2023-12-26 22:40 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 1936 bytes --]

There is a new pull request by tornaria against master on the void-packages repository

https://github.com/tornaria/void-packages glibc
https://github.com/void-linux/void-packages/pull/47914

glibc: fix memalign performance regression
<!-- Uncomment relevant sections and delete options which are not applicable -->

The upgrade to 2.38 brought a very sad performance regression in sagemath:
```
$ time python -c 'from sage.graphs.generators.distance_regular import DoubleGrassmannGraph; print(DoubleGrassmannGraph(2,2))'
<string>:1: UserWarning: Resolving lazy import GF during startup
<string>:1: UserWarning: Resolving lazy import VectorSpace during startup
Double Grassmann graph (5, 2, 2)

real	0m30.101s
user	0m29.959s
sys	0m0.060s
```
while the same thing in 2.36 (or after this PR) takes ~ 1-2 seconds.

Thanks to @oreo639 for figuring out it was https://sourceware.org/bugzilla/show_bug.cgi?id=30723

Indeed, all the performance regressions I was seeing are gone now.

#### Testing the changes
- I tested the changes in this PR: **briefly**

<!--
#### New package
- This new package conforms to the [package requirements](https://github.com/void-linux/void-packages/blob/master/CONTRIBUTING.md#package-requirements): **YES**|**NO**
-->

<!-- Note: If the build is likely to take more than 2 hours, please add ci skip tag as described in
https://github.com/void-linux/void-packages/blob/master/CONTRIBUTING.md#continuous-integration
and test at least one native build and, if supported, at least one cross build.
Ignore this section if this PR is not skipping CI.
-->
<!--
#### Local build testing
- I built this PR locally for my native architecture, (ARCH-LIBC)
- I built this PR locally for these architectures (if supported. mark crossbuilds):
  - aarch64-musl
  - armv7l
  - armv6l-musl
-->


A patch file from https://github.com/void-linux/void-packages/pull/47914.patch is attached

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: github-pr-glibc-47914.patch --]
[-- Type: text/x-diff, Size: 21088 bytes --]

From 67b1c68f1d7bd7f343e9778f3a19c1495e15116b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:28:39 -0300
Subject: [PATCH] glibc: fix memalign performance regression

---
 ...1105852568c3ebc712225ae78b8c8ba31a78.patch | 296 ++++++++++++++++++
 ...fc1cf094406a138e4d1bcf9553e59edcf89d.patch | 252 +++++++++++++++
 srcpkgs/glibc/template                        |   2 +-
 3 files changed, 549 insertions(+), 1 deletion(-)
 create mode 100644 srcpkgs/glibc/patches/30723-1-542b1105852568c3ebc712225ae78b8c8ba31a78.patch
 create mode 100644 srcpkgs/glibc/patches/30723-2-0dc7fc1cf094406a138e4d1bcf9553e59edcf89d.patch

diff --git a/srcpkgs/glibc/patches/30723-1-542b1105852568c3ebc712225ae78b8c8ba31a78.patch b/srcpkgs/glibc/patches/30723-1-542b1105852568c3ebc712225ae78b8c8ba31a78.patch
new file mode 100644
index 0000000000000..56d5d47c031a0
--- /dev/null
+++ b/srcpkgs/glibc/patches/30723-1-542b1105852568c3ebc712225ae78b8c8ba31a78.patch
@@ -0,0 +1,296 @@
+From 542b1105852568c3ebc712225ae78b8c8ba31a78 Mon Sep 17 00:00:00 2001
+From: Florian Weimer <fweimer@redhat.com>
+Date: Fri, 11 Aug 2023 11:18:17 +0200
+Subject: [PATCH] malloc: Enable merging of remainders in memalign (bug 30723)
+
+Previously, calling _int_free from _int_memalign could put remainders
+into the tcache or into fastbins, where they are invisible to the
+low-level allocator.  This results in missed merge opportunities
+because once these freed chunks become available to the low-level
+allocator, further memalign allocations (even of the same size are)
+likely obstructing merges.
+
+Furthermore, during forwards merging in _int_memalign, do not
+completely give up when the remainder is too small to serve as a
+chunk on its own.  We can still give it back if it can be merged
+with the following unused chunk.  This makes it more likely that
+memalign calls in a loop achieve a compact memory layout,
+independently of initial heap layout.
+
+Drop some useless (unsigned long) casts along the way, and tweak
+the style to more closely match GNU on changed lines.
+
+Reviewed-by: DJ Delorie <dj@redhat.com>
+---
+ malloc/malloc.c | 197 +++++++++++++++++++++++++++++-------------------
+ 1 file changed, 121 insertions(+), 76 deletions(-)
+
+diff --git a/malloc/malloc.c b/malloc/malloc.c
+index e2f1a615a4..948f9759af 100644
+--- a/malloc/malloc.c
++++ b/malloc/malloc.c
+@@ -1086,6 +1086,11 @@ typedef struct malloc_chunk* mchunkptr;
+ 
+ static void*  _int_malloc(mstate, size_t);
+ static void     _int_free(mstate, mchunkptr, int);
++static void _int_free_merge_chunk (mstate, mchunkptr, INTERNAL_SIZE_T);
++static INTERNAL_SIZE_T _int_free_create_chunk (mstate,
++					       mchunkptr, INTERNAL_SIZE_T,
++					       mchunkptr, INTERNAL_SIZE_T);
++static void _int_free_maybe_consolidate (mstate, INTERNAL_SIZE_T);
+ static void*  _int_realloc(mstate, mchunkptr, INTERNAL_SIZE_T,
+ 			   INTERNAL_SIZE_T);
+ static void*  _int_memalign(mstate, size_t, size_t);
+@@ -4637,31 +4642,52 @@ _int_free (mstate av, mchunkptr p, int have_lock)
+     if (!have_lock)
+       __libc_lock_lock (av->mutex);
+ 
+-    nextchunk = chunk_at_offset(p, size);
+-
+-    /* Lightweight tests: check whether the block is already the
+-       top block.  */
+-    if (__glibc_unlikely (p == av->top))
+-      malloc_printerr ("double free or corruption (top)");
+-    /* Or whether the next chunk is beyond the boundaries of the arena.  */
+-    if (__builtin_expect (contiguous (av)
+-			  && (char *) nextchunk
+-			  >= ((char *) av->top + chunksize(av->top)), 0))
+-	malloc_printerr ("double free or corruption (out)");
+-    /* Or whether the block is actually not marked used.  */
+-    if (__glibc_unlikely (!prev_inuse(nextchunk)))
+-      malloc_printerr ("double free or corruption (!prev)");
+-
+-    nextsize = chunksize(nextchunk);
+-    if (__builtin_expect (chunksize_nomask (nextchunk) <= CHUNK_HDR_SZ, 0)
+-	|| __builtin_expect (nextsize >= av->system_mem, 0))
+-      malloc_printerr ("free(): invalid next size (normal)");
++    _int_free_merge_chunk (av, p, size);
+ 
+-    free_perturb (chunk2mem(p), size - CHUNK_HDR_SZ);
++    if (!have_lock)
++      __libc_lock_unlock (av->mutex);
++  }
++  /*
++    If the chunk was allocated via mmap, release via munmap().
++  */
++
++  else {
++    munmap_chunk (p);
++  }
++}
++
++/* Try to merge chunk P of SIZE bytes with its neighbors.  Put the
++   resulting chunk on the appropriate bin list.  P must not be on a
++   bin list yet, and it can be in use.  */
++static void
++_int_free_merge_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size)
++{
++  mchunkptr nextchunk = chunk_at_offset(p, size);
++
++  /* Lightweight tests: check whether the block is already the
++     top block.  */
++  if (__glibc_unlikely (p == av->top))
++    malloc_printerr ("double free or corruption (top)");
++  /* Or whether the next chunk is beyond the boundaries of the arena.  */
++  if (__builtin_expect (contiguous (av)
++			&& (char *) nextchunk
++			>= ((char *) av->top + chunksize(av->top)), 0))
++    malloc_printerr ("double free or corruption (out)");
++  /* Or whether the block is actually not marked used.  */
++  if (__glibc_unlikely (!prev_inuse(nextchunk)))
++    malloc_printerr ("double free or corruption (!prev)");
++
++  INTERNAL_SIZE_T nextsize = chunksize(nextchunk);
++  if (__builtin_expect (chunksize_nomask (nextchunk) <= CHUNK_HDR_SZ, 0)
++      || __builtin_expect (nextsize >= av->system_mem, 0))
++    malloc_printerr ("free(): invalid next size (normal)");
++
++  free_perturb (chunk2mem(p), size - CHUNK_HDR_SZ);
+ 
+-    /* consolidate backward */
+-    if (!prev_inuse(p)) {
+-      prevsize = prev_size (p);
++  /* Consolidate backward.  */
++  if (!prev_inuse(p))
++    {
++      INTERNAL_SIZE_T prevsize = prev_size (p);
+       size += prevsize;
+       p = chunk_at_offset(p, -((long) prevsize));
+       if (__glibc_unlikely (chunksize(p) != prevsize))
+@@ -4669,9 +4695,25 @@ _int_free (mstate av, mchunkptr p, int have_lock)
+       unlink_chunk (av, p);
+     }
+ 
+-    if (nextchunk != av->top) {
++  /* Write the chunk header, maybe after merging with the following chunk.  */
++  size = _int_free_create_chunk (av, p, size, nextchunk, nextsize);
++  _int_free_maybe_consolidate (av, size);
++}
++
++/* Create a chunk at P of SIZE bytes, with SIZE potentially increased
++   to cover the immediately following chunk NEXTCHUNK of NEXTSIZE
++   bytes (if NEXTCHUNK is unused).  The chunk at P is not actually
++   read and does not have to be initialized.  After creation, it is
++   placed on the appropriate bin list.  The function returns the size
++   of the new chunk.  */
++static INTERNAL_SIZE_T
++_int_free_create_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size,
++			mchunkptr nextchunk, INTERNAL_SIZE_T nextsize)
++{
++  if (nextchunk != av->top)
++    {
+       /* get and clear inuse bit */
+-      nextinuse = inuse_bit_at_offset(nextchunk, nextsize);
++      bool nextinuse = inuse_bit_at_offset (nextchunk, nextsize);
+ 
+       /* consolidate forward */
+       if (!nextinuse) {
+@@ -4686,8 +4728,8 @@ _int_free (mstate av, mchunkptr p, int have_lock)
+ 	been given one chance to be used in malloc.
+       */
+ 
+-      bck = unsorted_chunks(av);
+-      fwd = bck->fd;
++      mchunkptr bck = unsorted_chunks (av);
++      mchunkptr fwd = bck->fd;
+       if (__glibc_unlikely (fwd->bk != bck))
+ 	malloc_printerr ("free(): corrupted unsorted chunks");
+       p->fd = fwd;
+@@ -4706,61 +4748,52 @@ _int_free (mstate av, mchunkptr p, int have_lock)
+       check_free_chunk(av, p);
+     }
+ 
+-    /*
+-      If the chunk borders the current high end of memory,
+-      consolidate into top
+-    */
+-
+-    else {
++  else
++    {
++      /* If the chunk borders the current high end of memory,
++	 consolidate into top.  */
+       size += nextsize;
+       set_head(p, size | PREV_INUSE);
+       av->top = p;
+       check_chunk(av, p);
+     }
+ 
+-    /*
+-      If freeing a large space, consolidate possibly-surrounding
+-      chunks. Then, if the total unused topmost memory exceeds trim
+-      threshold, ask malloc_trim to reduce top.
+-
+-      Unless max_fast is 0, we don't know if there are fastbins
+-      bordering top, so we cannot tell for sure whether threshold
+-      has been reached unless fastbins are consolidated.  But we
+-      don't want to consolidate on each free.  As a compromise,
+-      consolidation is performed if FASTBIN_CONSOLIDATION_THRESHOLD
+-      is reached.
+-    */
++  return size;
++}
+ 
+-    if ((unsigned long)(size) >= FASTBIN_CONSOLIDATION_THRESHOLD) {
++/* If freeing a large space, consolidate possibly-surrounding
++   chunks.  Then, if the total unused topmost memory exceeds trim
++   threshold, ask malloc_trim to reduce top.  */
++static void
++_int_free_maybe_consolidate (mstate av, INTERNAL_SIZE_T size)
++{
++  /* Unless max_fast is 0, we don't know if there are fastbins
++     bordering top, so we cannot tell for sure whether threshold has
++     been reached unless fastbins are consolidated.  But we don't want
++     to consolidate on each free.  As a compromise, consolidation is
++     performed if FASTBIN_CONSOLIDATION_THRESHOLD is reached.  */
++  if (size >= FASTBIN_CONSOLIDATION_THRESHOLD)
++    {
+       if (atomic_load_relaxed (&av->have_fastchunks))
+ 	malloc_consolidate(av);
+ 
+-      if (av == &main_arena) {
++      if (av == &main_arena)
++	{
+ #ifndef MORECORE_CANNOT_TRIM
+-	if ((unsigned long)(chunksize(av->top)) >=
+-	    (unsigned long)(mp_.trim_threshold))
+-	  systrim(mp_.top_pad, av);
++	  if (chunksize (av->top) >= mp_.trim_threshold)
++	    systrim (mp_.top_pad, av);
+ #endif
+-      } else {
+-	/* Always try heap_trim(), even if the top chunk is not
+-	   large, because the corresponding heap might go away.  */
+-	heap_info *heap = heap_for_ptr(top(av));
++	}
++      else
++	{
++	  /* Always try heap_trim, even if the top chunk is not large,
++	     because the corresponding heap might go away.  */
++	  heap_info *heap = heap_for_ptr (top (av));
+ 
+-	assert(heap->ar_ptr == av);
+-	heap_trim(heap, mp_.top_pad);
+-      }
++	  assert (heap->ar_ptr == av);
++	  heap_trim (heap, mp_.top_pad);
++	}
+     }
+-
+-    if (!have_lock)
+-      __libc_lock_unlock (av->mutex);
+-  }
+-  /*
+-    If the chunk was allocated via mmap, release via munmap().
+-  */
+-
+-  else {
+-    munmap_chunk (p);
+-  }
+ }
+ 
+ /*
+@@ -5221,7 +5254,7 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
+                 (av != &main_arena ? NON_MAIN_ARENA : 0));
+       set_inuse_bit_at_offset (newp, newsize);
+       set_head_size (p, leadsize | (av != &main_arena ? NON_MAIN_ARENA : 0));
+-      _int_free (av, p, 1);
++      _int_free_merge_chunk (av, p, leadsize);
+       p = newp;
+ 
+       assert (newsize >= nb &&
+@@ -5232,15 +5265,27 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
+   if (!chunk_is_mmapped (p))
+     {
+       size = chunksize (p);
+-      if ((unsigned long) (size) > (unsigned long) (nb + MINSIZE))
++      mchunkptr nextchunk = chunk_at_offset(p, size);
++      INTERNAL_SIZE_T nextsize = chunksize(nextchunk);
++      if (size > nb)
+         {
+           remainder_size = size - nb;
+-          remainder = chunk_at_offset (p, nb);
+-          set_head (remainder, remainder_size | PREV_INUSE |
+-                    (av != &main_arena ? NON_MAIN_ARENA : 0));
+-          set_head_size (p, nb);
+-          _int_free (av, remainder, 1);
+-        }
++	  if (remainder_size >= MINSIZE
++	      || nextchunk == av->top
++	      || !inuse_bit_at_offset (nextchunk, nextsize))
++	    {
++	      /* We can only give back the tail if it is larger than
++		 MINSIZE, or if the following chunk is unused (top
++		 chunk or unused in-heap chunk).  Otherwise we would
++		 create a chunk that is smaller than MINSIZE.  */
++	      remainder = chunk_at_offset (p, nb);
++	      set_head_size (p, nb);
++	      remainder_size = _int_free_create_chunk (av, remainder,
++						       remainder_size,
++						       nextchunk, nextsize);
++	      _int_free_maybe_consolidate (av, remainder_size);
++	    }
++	}
+     }
+ 
+   check_inuse_chunk (av, p);
diff --git a/srcpkgs/glibc/patches/30723-2-0dc7fc1cf094406a138e4d1bcf9553e59edcf89d.patch b/srcpkgs/glibc/patches/30723-2-0dc7fc1cf094406a138e4d1bcf9553e59edcf89d.patch
new file mode 100644
index 0000000000000..4615c7b035cc7
--- /dev/null
+++ b/srcpkgs/glibc/patches/30723-2-0dc7fc1cf094406a138e4d1bcf9553e59edcf89d.patch
@@ -0,0 +1,252 @@
+From 0dc7fc1cf094406a138e4d1bcf9553e59edcf89d Mon Sep 17 00:00:00 2001
+From: Florian Weimer <fweimer@redhat.com>
+Date: Thu, 10 Aug 2023 19:36:56 +0200
+Subject: [PATCH] malloc: Remove bin scanning from memalign (bug 30723)
+
+On the test workload (mpv --cache=yes with VP9 video decoding), the
+bin scanning has a very poor success rate (less than 2%).  The tcache
+scanning has about 50% success rate, so keep that.
+
+Update comments in malloc/tst-memalign-2 to indicate the purpose
+of the tests.  Even with the scanning removed, the additional
+merging opportunities since commit 542b1105852568c3ebc712225ae78b
+("malloc: Enable merging of remainders in memalign (bug 30723)")
+are sufficient to pass the existing large bins test.
+
+Remove leftover variables from _int_free from refactoring in the
+same commit.
+
+Reviewed-by: DJ Delorie <dj@redhat.com>
+---
+ malloc/malloc.c         | 169 ++--------------------------------------
+ malloc/tst-memalign-2.c |   7 +-
+ 2 files changed, 10 insertions(+), 166 deletions(-)
+
+diff --git a/malloc/malloc.c b/malloc/malloc.c
+index 948f9759af..d0bbbf3710 100644
+--- a/malloc/malloc.c
++++ b/malloc/malloc.c
+@@ -4488,12 +4488,6 @@ _int_free (mstate av, mchunkptr p, int have_lock)
+ {
+   INTERNAL_SIZE_T size;        /* its size */
+   mfastbinptr *fb;             /* associated fastbin */
+-  mchunkptr nextchunk;         /* next contiguous chunk */
+-  INTERNAL_SIZE_T nextsize;    /* its size */
+-  int nextinuse;               /* true if nextchunk is used */
+-  INTERNAL_SIZE_T prevsize;    /* size of previous contiguous chunk */
+-  mchunkptr bck;               /* misc temp for linking */
+-  mchunkptr fwd;               /* misc temp for linking */
+ 
+   size = chunksize (p);
+ 
+@@ -5032,42 +5026,6 @@ _int_realloc (mstate av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
+    ------------------------------ memalign ------------------------------
+  */
+ 
+-/* Returns 0 if the chunk is not and does not contain the requested
+-   aligned sub-chunk, else returns the amount of "waste" from
+-   trimming.  NB is the *chunk* byte size, not the user byte
+-   size.  */
+-static size_t
+-chunk_ok_for_memalign (mchunkptr p, size_t alignment, size_t nb)
+-{
+-  void *m = chunk2mem (p);
+-  INTERNAL_SIZE_T size = chunksize (p);
+-  void *aligned_m = m;
+-
+-  if (__glibc_unlikely (misaligned_chunk (p)))
+-    malloc_printerr ("_int_memalign(): unaligned chunk detected");
+-
+-  aligned_m = PTR_ALIGN_UP (m, alignment);
+-
+-  INTERNAL_SIZE_T front_extra = (intptr_t) aligned_m - (intptr_t) m;
+-
+-  /* We can't trim off the front as it's too small.  */
+-  if (front_extra > 0 && front_extra < MINSIZE)
+-    return 0;
+-
+-  /* If it's a perfect fit, it's an exception to the return value rule
+-     (we would return zero waste, which looks like "not usable"), so
+-     handle it here by returning a small non-zero value instead.  */
+-  if (size == nb && front_extra == 0)
+-    return 1;
+-
+-  /* If the block we need fits in the chunk, calculate total waste.  */
+-  if (size > nb + front_extra)
+-    return size - nb;
+-
+-  /* Can't use this chunk.  */
+-  return 0;
+-}
+-
+ /* BYTES is user requested bytes, not requested chunksize bytes.  */
+ static void *
+ _int_memalign (mstate av, size_t alignment, size_t bytes)
+@@ -5082,7 +5040,6 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
+   mchunkptr remainder;            /* spare room at end to split off */
+   unsigned long remainder_size;   /* its size */
+   INTERNAL_SIZE_T size;
+-  mchunkptr victim;
+ 
+   nb = checked_request2size (bytes);
+   if (nb == 0)
+@@ -5101,129 +5058,13 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
+      we don't find anything in those bins, the common malloc code will
+      scan starting at 2x.  */
+ 
+-  /* This will be set if we found a candidate chunk.  */
+-  victim = NULL;
+-
+-  /* Fast bins are singly-linked, hard to remove a chunk from the middle
+-     and unlikely to meet our alignment requirements.  We have not done
+-     any experimentation with searching for aligned fastbins.  */
+-
+-  if (av != NULL)
+-    {
+-      int first_bin_index;
+-      int first_largebin_index;
+-      int last_bin_index;
+-
+-      if (in_smallbin_range (nb))
+-	first_bin_index = smallbin_index (nb);
+-      else
+-	first_bin_index = largebin_index (nb);
+-
+-      if (in_smallbin_range (nb * 2))
+-	last_bin_index = smallbin_index (nb * 2);
+-      else
+-	last_bin_index = largebin_index (nb * 2);
+-
+-      first_largebin_index = largebin_index (MIN_LARGE_SIZE);
+-
+-      int victim_index;                 /* its bin index */
+-
+-      for (victim_index = first_bin_index;
+-	   victim_index < last_bin_index;
+-	   victim_index ++)
+-	{
+-	  victim = NULL;
+-
+-	  if (victim_index < first_largebin_index)
+-	    {
+-	      /* Check small bins.  Small bin chunks are doubly-linked despite
+-		 being the same size.  */
+-
+-	      mchunkptr fwd;                    /* misc temp for linking */
+-	      mchunkptr bck;                    /* misc temp for linking */
+-
+-	      bck = bin_at (av, victim_index);
+-	      fwd = bck->fd;
+-	      while (fwd != bck)
+-		{
+-		  if (chunk_ok_for_memalign (fwd, alignment, nb) > 0)
+-		    {
+-		      victim = fwd;
+-
+-		      /* Unlink it */
+-		      victim->fd->bk = victim->bk;
+-		      victim->bk->fd = victim->fd;
+-		      break;
+-		    }
+-
+-		  fwd = fwd->fd;
+-		}
+-	    }
+-	  else
+-	    {
+-	      /* Check large bins.  */
+-	      mchunkptr fwd;                    /* misc temp for linking */
+-	      mchunkptr bck;                    /* misc temp for linking */
+-	      mchunkptr best = NULL;
+-	      size_t best_size = 0;
+-
+-	      bck = bin_at (av, victim_index);
+-	      fwd = bck->fd;
++  /* Call malloc with worst case padding to hit alignment. */
++  m = (char *) (_int_malloc (av, nb + alignment + MINSIZE));
+ 
+-	      while (fwd != bck)
+-		{
+-		  int extra;
+-
+-		  if (chunksize (fwd) < nb)
+-		    break;
+-		  extra = chunk_ok_for_memalign (fwd, alignment, nb);
+-		  if (extra > 0
+-		      && (extra <= best_size || best == NULL))
+-		    {
+-		      best = fwd;
+-		      best_size = extra;
+-		    }
++  if (m == 0)
++    return 0;           /* propagate failure */
+ 
+-		  fwd = fwd->fd;
+-		}
+-	      victim = best;
+-
+-	      if (victim != NULL)
+-		{
+-		  unlink_chunk (av, victim);
+-		  break;
+-		}
+-	    }
+-
+-	  if (victim != NULL)
+-	    break;
+-	}
+-    }
+-
+-  /* Strategy: find a spot within that chunk that meets the alignment
+-     request, and then possibly free the leading and trailing space.
+-     This strategy is incredibly costly and can lead to external
+-     fragmentation if header and footer chunks are unused.  */
+-
+-  if (victim != NULL)
+-    {
+-      p = victim;
+-      m = chunk2mem (p);
+-      set_inuse (p);
+-      if (av != &main_arena)
+-	set_non_main_arena (p);
+-    }
+-  else
+-    {
+-      /* Call malloc with worst case padding to hit alignment. */
+-
+-      m = (char *) (_int_malloc (av, nb + alignment + MINSIZE));
+-
+-      if (m == 0)
+-	return 0;           /* propagate failure */
+-
+-      p = mem2chunk (m);
+-    }
++  p = mem2chunk (m);
+ 
+   if ((((unsigned long) (m)) % alignment) != 0)   /* misaligned */
+     {
+diff --git a/malloc/tst-memalign-2.c b/malloc/tst-memalign-2.c
+index f229283dbf..ecd6fa249e 100644
+--- a/malloc/tst-memalign-2.c
++++ b/malloc/tst-memalign-2.c
+@@ -86,7 +86,8 @@ do_test (void)
+       TEST_VERIFY (tcache_allocs[i].ptr1 == tcache_allocs[i].ptr2);
+     }
+ 
+-  /* Test for non-head tcache hits.  */
++  /* Test for non-head tcache hits.  This exercises the memalign
++     scanning code to find matching allocations.  */
+   for (i = 0; i < array_length (ptr); ++ i)
+     {
+       if (i == 4)
+@@ -113,7 +114,9 @@ do_test (void)
+   free (p);
+   TEST_VERIFY (count > 0);
+ 
+-  /* Large bins test.  */
++  /* Large bins test.  This verifies that the over-allocated parts
++     that memalign releases for future allocations can be reused by
++     memalign itself at least in some cases.  */
+ 
+   for (i = 0; i < LN; ++ i)
+     {
diff --git a/srcpkgs/glibc/template b/srcpkgs/glibc/template
index 20805fb52e816..cf7cd073a9500 100644
--- a/srcpkgs/glibc/template
+++ b/srcpkgs/glibc/template
@@ -1,7 +1,7 @@
 # Template file for 'glibc'
 pkgname=glibc
 version=2.38
-revision=1
+revision=2
 bootstrap=yes
 short_desc="GNU C library"
 maintainer="Enno Boland <gottox@voidlinux.org>"

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PR PATCH] [Updated] glibc: fix memalign performance regression
  2023-12-26 22:40 [PR PATCH] glibc: fix memalign performance regression tornaria
@ 2023-12-26 22:52 ` tornaria
  2023-12-26 23:23 ` [PR PATCH] [Updated] [ci skip] " tornaria
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: tornaria @ 2023-12-26 22:52 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 1941 bytes --]

There is an updated pull request by tornaria against master on the void-packages repository

https://github.com/tornaria/void-packages glibc
https://github.com/void-linux/void-packages/pull/47914

glibc: fix memalign performance regression
<!-- Uncomment relevant sections and delete options which are not applicable -->

The upgrade to 2.38 brought a very sad performance regression in sagemath:
```
$ time python -c 'from sage.graphs.generators.distance_regular import DoubleGrassmannGraph; print(DoubleGrassmannGraph(2,2))'
<string>:1: UserWarning: Resolving lazy import GF during startup
<string>:1: UserWarning: Resolving lazy import VectorSpace during startup
Double Grassmann graph (5, 2, 2)

real	0m30.101s
user	0m29.959s
sys	0m0.060s
```
while the same thing in 2.36 (or after this PR) takes ~ 1-2 seconds.

Thanks to @oreo639 for figuring out it was https://sourceware.org/bugzilla/show_bug.cgi?id=30723

Indeed, all the performance regressions I was seeing are gone now.

#### Testing the changes
- I tested the changes in this PR: **briefly**

<!--
#### New package
- This new package conforms to the [package requirements](https://github.com/void-linux/void-packages/blob/master/CONTRIBUTING.md#package-requirements): **YES**|**NO**
-->

<!-- Note: If the build is likely to take more than 2 hours, please add ci skip tag as described in
https://github.com/void-linux/void-packages/blob/master/CONTRIBUTING.md#continuous-integration
and test at least one native build and, if supported, at least one cross build.
Ignore this section if this PR is not skipping CI.
-->
<!--
#### Local build testing
- I built this PR locally for my native architecture, (ARCH-LIBC)
- I built this PR locally for these architectures (if supported. mark crossbuilds):
  - aarch64-musl
  - armv7l
  - armv6l-musl
-->


A patch file from https://github.com/void-linux/void-packages/pull/47914.patch is attached

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: github-pr-glibc-47914.patch --]
[-- Type: text/x-diff, Size: 28345 bytes --]

From f957c22118b590ccf808bf95f0608a6e27163a02 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:28:39 -0300
Subject: [PATCH 1/9] glibc: fix performance regression in posix_memalign with
 long free lists

Thanks @oreo639
---
 ...1105852568c3ebc712225ae78b8c8ba31a78.patch | 296 ++++++++++++++++++
 ...fc1cf094406a138e4d1bcf9553e59edcf89d.patch | 252 +++++++++++++++
 srcpkgs/glibc/template                        |   2 +-
 3 files changed, 549 insertions(+), 1 deletion(-)
 create mode 100644 srcpkgs/glibc/patches/30723-1-542b1105852568c3ebc712225ae78b8c8ba31a78.patch
 create mode 100644 srcpkgs/glibc/patches/30723-2-0dc7fc1cf094406a138e4d1bcf9553e59edcf89d.patch

diff --git a/srcpkgs/glibc/patches/30723-1-542b1105852568c3ebc712225ae78b8c8ba31a78.patch b/srcpkgs/glibc/patches/30723-1-542b1105852568c3ebc712225ae78b8c8ba31a78.patch
new file mode 100644
index 0000000000000..56d5d47c031a0
--- /dev/null
+++ b/srcpkgs/glibc/patches/30723-1-542b1105852568c3ebc712225ae78b8c8ba31a78.patch
@@ -0,0 +1,296 @@
+From 542b1105852568c3ebc712225ae78b8c8ba31a78 Mon Sep 17 00:00:00 2001
+From: Florian Weimer <fweimer@redhat.com>
+Date: Fri, 11 Aug 2023 11:18:17 +0200
+Subject: [PATCH] malloc: Enable merging of remainders in memalign (bug 30723)
+
+Previously, calling _int_free from _int_memalign could put remainders
+into the tcache or into fastbins, where they are invisible to the
+low-level allocator.  This results in missed merge opportunities
+because once these freed chunks become available to the low-level
+allocator, further memalign allocations (even of the same size are)
+likely obstructing merges.
+
+Furthermore, during forwards merging in _int_memalign, do not
+completely give up when the remainder is too small to serve as a
+chunk on its own.  We can still give it back if it can be merged
+with the following unused chunk.  This makes it more likely that
+memalign calls in a loop achieve a compact memory layout,
+independently of initial heap layout.
+
+Drop some useless (unsigned long) casts along the way, and tweak
+the style to more closely match GNU on changed lines.
+
+Reviewed-by: DJ Delorie <dj@redhat.com>
+---
+ malloc/malloc.c | 197 +++++++++++++++++++++++++++++-------------------
+ 1 file changed, 121 insertions(+), 76 deletions(-)
+
+diff --git a/malloc/malloc.c b/malloc/malloc.c
+index e2f1a615a4..948f9759af 100644
+--- a/malloc/malloc.c
++++ b/malloc/malloc.c
+@@ -1086,6 +1086,11 @@ typedef struct malloc_chunk* mchunkptr;
+ 
+ static void*  _int_malloc(mstate, size_t);
+ static void     _int_free(mstate, mchunkptr, int);
++static void _int_free_merge_chunk (mstate, mchunkptr, INTERNAL_SIZE_T);
++static INTERNAL_SIZE_T _int_free_create_chunk (mstate,
++					       mchunkptr, INTERNAL_SIZE_T,
++					       mchunkptr, INTERNAL_SIZE_T);
++static void _int_free_maybe_consolidate (mstate, INTERNAL_SIZE_T);
+ static void*  _int_realloc(mstate, mchunkptr, INTERNAL_SIZE_T,
+ 			   INTERNAL_SIZE_T);
+ static void*  _int_memalign(mstate, size_t, size_t);
+@@ -4637,31 +4642,52 @@ _int_free (mstate av, mchunkptr p, int have_lock)
+     if (!have_lock)
+       __libc_lock_lock (av->mutex);
+ 
+-    nextchunk = chunk_at_offset(p, size);
+-
+-    /* Lightweight tests: check whether the block is already the
+-       top block.  */
+-    if (__glibc_unlikely (p == av->top))
+-      malloc_printerr ("double free or corruption (top)");
+-    /* Or whether the next chunk is beyond the boundaries of the arena.  */
+-    if (__builtin_expect (contiguous (av)
+-			  && (char *) nextchunk
+-			  >= ((char *) av->top + chunksize(av->top)), 0))
+-	malloc_printerr ("double free or corruption (out)");
+-    /* Or whether the block is actually not marked used.  */
+-    if (__glibc_unlikely (!prev_inuse(nextchunk)))
+-      malloc_printerr ("double free or corruption (!prev)");
+-
+-    nextsize = chunksize(nextchunk);
+-    if (__builtin_expect (chunksize_nomask (nextchunk) <= CHUNK_HDR_SZ, 0)
+-	|| __builtin_expect (nextsize >= av->system_mem, 0))
+-      malloc_printerr ("free(): invalid next size (normal)");
++    _int_free_merge_chunk (av, p, size);
+ 
+-    free_perturb (chunk2mem(p), size - CHUNK_HDR_SZ);
++    if (!have_lock)
++      __libc_lock_unlock (av->mutex);
++  }
++  /*
++    If the chunk was allocated via mmap, release via munmap().
++  */
++
++  else {
++    munmap_chunk (p);
++  }
++}
++
++/* Try to merge chunk P of SIZE bytes with its neighbors.  Put the
++   resulting chunk on the appropriate bin list.  P must not be on a
++   bin list yet, and it can be in use.  */
++static void
++_int_free_merge_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size)
++{
++  mchunkptr nextchunk = chunk_at_offset(p, size);
++
++  /* Lightweight tests: check whether the block is already the
++     top block.  */
++  if (__glibc_unlikely (p == av->top))
++    malloc_printerr ("double free or corruption (top)");
++  /* Or whether the next chunk is beyond the boundaries of the arena.  */
++  if (__builtin_expect (contiguous (av)
++			&& (char *) nextchunk
++			>= ((char *) av->top + chunksize(av->top)), 0))
++    malloc_printerr ("double free or corruption (out)");
++  /* Or whether the block is actually not marked used.  */
++  if (__glibc_unlikely (!prev_inuse(nextchunk)))
++    malloc_printerr ("double free or corruption (!prev)");
++
++  INTERNAL_SIZE_T nextsize = chunksize(nextchunk);
++  if (__builtin_expect (chunksize_nomask (nextchunk) <= CHUNK_HDR_SZ, 0)
++      || __builtin_expect (nextsize >= av->system_mem, 0))
++    malloc_printerr ("free(): invalid next size (normal)");
++
++  free_perturb (chunk2mem(p), size - CHUNK_HDR_SZ);
+ 
+-    /* consolidate backward */
+-    if (!prev_inuse(p)) {
+-      prevsize = prev_size (p);
++  /* Consolidate backward.  */
++  if (!prev_inuse(p))
++    {
++      INTERNAL_SIZE_T prevsize = prev_size (p);
+       size += prevsize;
+       p = chunk_at_offset(p, -((long) prevsize));
+       if (__glibc_unlikely (chunksize(p) != prevsize))
+@@ -4669,9 +4695,25 @@ _int_free (mstate av, mchunkptr p, int have_lock)
+       unlink_chunk (av, p);
+     }
+ 
+-    if (nextchunk != av->top) {
++  /* Write the chunk header, maybe after merging with the following chunk.  */
++  size = _int_free_create_chunk (av, p, size, nextchunk, nextsize);
++  _int_free_maybe_consolidate (av, size);
++}
++
++/* Create a chunk at P of SIZE bytes, with SIZE potentially increased
++   to cover the immediately following chunk NEXTCHUNK of NEXTSIZE
++   bytes (if NEXTCHUNK is unused).  The chunk at P is not actually
++   read and does not have to be initialized.  After creation, it is
++   placed on the appropriate bin list.  The function returns the size
++   of the new chunk.  */
++static INTERNAL_SIZE_T
++_int_free_create_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size,
++			mchunkptr nextchunk, INTERNAL_SIZE_T nextsize)
++{
++  if (nextchunk != av->top)
++    {
+       /* get and clear inuse bit */
+-      nextinuse = inuse_bit_at_offset(nextchunk, nextsize);
++      bool nextinuse = inuse_bit_at_offset (nextchunk, nextsize);
+ 
+       /* consolidate forward */
+       if (!nextinuse) {
+@@ -4686,8 +4728,8 @@ _int_free (mstate av, mchunkptr p, int have_lock)
+ 	been given one chance to be used in malloc.
+       */
+ 
+-      bck = unsorted_chunks(av);
+-      fwd = bck->fd;
++      mchunkptr bck = unsorted_chunks (av);
++      mchunkptr fwd = bck->fd;
+       if (__glibc_unlikely (fwd->bk != bck))
+ 	malloc_printerr ("free(): corrupted unsorted chunks");
+       p->fd = fwd;
+@@ -4706,61 +4748,52 @@ _int_free (mstate av, mchunkptr p, int have_lock)
+       check_free_chunk(av, p);
+     }
+ 
+-    /*
+-      If the chunk borders the current high end of memory,
+-      consolidate into top
+-    */
+-
+-    else {
++  else
++    {
++      /* If the chunk borders the current high end of memory,
++	 consolidate into top.  */
+       size += nextsize;
+       set_head(p, size | PREV_INUSE);
+       av->top = p;
+       check_chunk(av, p);
+     }
+ 
+-    /*
+-      If freeing a large space, consolidate possibly-surrounding
+-      chunks. Then, if the total unused topmost memory exceeds trim
+-      threshold, ask malloc_trim to reduce top.
+-
+-      Unless max_fast is 0, we don't know if there are fastbins
+-      bordering top, so we cannot tell for sure whether threshold
+-      has been reached unless fastbins are consolidated.  But we
+-      don't want to consolidate on each free.  As a compromise,
+-      consolidation is performed if FASTBIN_CONSOLIDATION_THRESHOLD
+-      is reached.
+-    */
++  return size;
++}
+ 
+-    if ((unsigned long)(size) >= FASTBIN_CONSOLIDATION_THRESHOLD) {
++/* If freeing a large space, consolidate possibly-surrounding
++   chunks.  Then, if the total unused topmost memory exceeds trim
++   threshold, ask malloc_trim to reduce top.  */
++static void
++_int_free_maybe_consolidate (mstate av, INTERNAL_SIZE_T size)
++{
++  /* Unless max_fast is 0, we don't know if there are fastbins
++     bordering top, so we cannot tell for sure whether threshold has
++     been reached unless fastbins are consolidated.  But we don't want
++     to consolidate on each free.  As a compromise, consolidation is
++     performed if FASTBIN_CONSOLIDATION_THRESHOLD is reached.  */
++  if (size >= FASTBIN_CONSOLIDATION_THRESHOLD)
++    {
+       if (atomic_load_relaxed (&av->have_fastchunks))
+ 	malloc_consolidate(av);
+ 
+-      if (av == &main_arena) {
++      if (av == &main_arena)
++	{
+ #ifndef MORECORE_CANNOT_TRIM
+-	if ((unsigned long)(chunksize(av->top)) >=
+-	    (unsigned long)(mp_.trim_threshold))
+-	  systrim(mp_.top_pad, av);
++	  if (chunksize (av->top) >= mp_.trim_threshold)
++	    systrim (mp_.top_pad, av);
+ #endif
+-      } else {
+-	/* Always try heap_trim(), even if the top chunk is not
+-	   large, because the corresponding heap might go away.  */
+-	heap_info *heap = heap_for_ptr(top(av));
++	}
++      else
++	{
++	  /* Always try heap_trim, even if the top chunk is not large,
++	     because the corresponding heap might go away.  */
++	  heap_info *heap = heap_for_ptr (top (av));
+ 
+-	assert(heap->ar_ptr == av);
+-	heap_trim(heap, mp_.top_pad);
+-      }
++	  assert (heap->ar_ptr == av);
++	  heap_trim (heap, mp_.top_pad);
++	}
+     }
+-
+-    if (!have_lock)
+-      __libc_lock_unlock (av->mutex);
+-  }
+-  /*
+-    If the chunk was allocated via mmap, release via munmap().
+-  */
+-
+-  else {
+-    munmap_chunk (p);
+-  }
+ }
+ 
+ /*
+@@ -5221,7 +5254,7 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
+                 (av != &main_arena ? NON_MAIN_ARENA : 0));
+       set_inuse_bit_at_offset (newp, newsize);
+       set_head_size (p, leadsize | (av != &main_arena ? NON_MAIN_ARENA : 0));
+-      _int_free (av, p, 1);
++      _int_free_merge_chunk (av, p, leadsize);
+       p = newp;
+ 
+       assert (newsize >= nb &&
+@@ -5232,15 +5265,27 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
+   if (!chunk_is_mmapped (p))
+     {
+       size = chunksize (p);
+-      if ((unsigned long) (size) > (unsigned long) (nb + MINSIZE))
++      mchunkptr nextchunk = chunk_at_offset(p, size);
++      INTERNAL_SIZE_T nextsize = chunksize(nextchunk);
++      if (size > nb)
+         {
+           remainder_size = size - nb;
+-          remainder = chunk_at_offset (p, nb);
+-          set_head (remainder, remainder_size | PREV_INUSE |
+-                    (av != &main_arena ? NON_MAIN_ARENA : 0));
+-          set_head_size (p, nb);
+-          _int_free (av, remainder, 1);
+-        }
++	  if (remainder_size >= MINSIZE
++	      || nextchunk == av->top
++	      || !inuse_bit_at_offset (nextchunk, nextsize))
++	    {
++	      /* We can only give back the tail if it is larger than
++		 MINSIZE, or if the following chunk is unused (top
++		 chunk or unused in-heap chunk).  Otherwise we would
++		 create a chunk that is smaller than MINSIZE.  */
++	      remainder = chunk_at_offset (p, nb);
++	      set_head_size (p, nb);
++	      remainder_size = _int_free_create_chunk (av, remainder,
++						       remainder_size,
++						       nextchunk, nextsize);
++	      _int_free_maybe_consolidate (av, remainder_size);
++	    }
++	}
+     }
+ 
+   check_inuse_chunk (av, p);
diff --git a/srcpkgs/glibc/patches/30723-2-0dc7fc1cf094406a138e4d1bcf9553e59edcf89d.patch b/srcpkgs/glibc/patches/30723-2-0dc7fc1cf094406a138e4d1bcf9553e59edcf89d.patch
new file mode 100644
index 0000000000000..4615c7b035cc7
--- /dev/null
+++ b/srcpkgs/glibc/patches/30723-2-0dc7fc1cf094406a138e4d1bcf9553e59edcf89d.patch
@@ -0,0 +1,252 @@
+From 0dc7fc1cf094406a138e4d1bcf9553e59edcf89d Mon Sep 17 00:00:00 2001
+From: Florian Weimer <fweimer@redhat.com>
+Date: Thu, 10 Aug 2023 19:36:56 +0200
+Subject: [PATCH] malloc: Remove bin scanning from memalign (bug 30723)
+
+On the test workload (mpv --cache=yes with VP9 video decoding), the
+bin scanning has a very poor success rate (less than 2%).  The tcache
+scanning has about 50% success rate, so keep that.
+
+Update comments in malloc/tst-memalign-2 to indicate the purpose
+of the tests.  Even with the scanning removed, the additional
+merging opportunities since commit 542b1105852568c3ebc712225ae78b
+("malloc: Enable merging of remainders in memalign (bug 30723)")
+are sufficient to pass the existing large bins test.
+
+Remove leftover variables from _int_free from refactoring in the
+same commit.
+
+Reviewed-by: DJ Delorie <dj@redhat.com>
+---
+ malloc/malloc.c         | 169 ++--------------------------------------
+ malloc/tst-memalign-2.c |   7 +-
+ 2 files changed, 10 insertions(+), 166 deletions(-)
+
+diff --git a/malloc/malloc.c b/malloc/malloc.c
+index 948f9759af..d0bbbf3710 100644
+--- a/malloc/malloc.c
++++ b/malloc/malloc.c
+@@ -4488,12 +4488,6 @@ _int_free (mstate av, mchunkptr p, int have_lock)
+ {
+   INTERNAL_SIZE_T size;        /* its size */
+   mfastbinptr *fb;             /* associated fastbin */
+-  mchunkptr nextchunk;         /* next contiguous chunk */
+-  INTERNAL_SIZE_T nextsize;    /* its size */
+-  int nextinuse;               /* true if nextchunk is used */
+-  INTERNAL_SIZE_T prevsize;    /* size of previous contiguous chunk */
+-  mchunkptr bck;               /* misc temp for linking */
+-  mchunkptr fwd;               /* misc temp for linking */
+ 
+   size = chunksize (p);
+ 
+@@ -5032,42 +5026,6 @@ _int_realloc (mstate av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
+    ------------------------------ memalign ------------------------------
+  */
+ 
+-/* Returns 0 if the chunk is not and does not contain the requested
+-   aligned sub-chunk, else returns the amount of "waste" from
+-   trimming.  NB is the *chunk* byte size, not the user byte
+-   size.  */
+-static size_t
+-chunk_ok_for_memalign (mchunkptr p, size_t alignment, size_t nb)
+-{
+-  void *m = chunk2mem (p);
+-  INTERNAL_SIZE_T size = chunksize (p);
+-  void *aligned_m = m;
+-
+-  if (__glibc_unlikely (misaligned_chunk (p)))
+-    malloc_printerr ("_int_memalign(): unaligned chunk detected");
+-
+-  aligned_m = PTR_ALIGN_UP (m, alignment);
+-
+-  INTERNAL_SIZE_T front_extra = (intptr_t) aligned_m - (intptr_t) m;
+-
+-  /* We can't trim off the front as it's too small.  */
+-  if (front_extra > 0 && front_extra < MINSIZE)
+-    return 0;
+-
+-  /* If it's a perfect fit, it's an exception to the return value rule
+-     (we would return zero waste, which looks like "not usable"), so
+-     handle it here by returning a small non-zero value instead.  */
+-  if (size == nb && front_extra == 0)
+-    return 1;
+-
+-  /* If the block we need fits in the chunk, calculate total waste.  */
+-  if (size > nb + front_extra)
+-    return size - nb;
+-
+-  /* Can't use this chunk.  */
+-  return 0;
+-}
+-
+ /* BYTES is user requested bytes, not requested chunksize bytes.  */
+ static void *
+ _int_memalign (mstate av, size_t alignment, size_t bytes)
+@@ -5082,7 +5040,6 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
+   mchunkptr remainder;            /* spare room at end to split off */
+   unsigned long remainder_size;   /* its size */
+   INTERNAL_SIZE_T size;
+-  mchunkptr victim;
+ 
+   nb = checked_request2size (bytes);
+   if (nb == 0)
+@@ -5101,129 +5058,13 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
+      we don't find anything in those bins, the common malloc code will
+      scan starting at 2x.  */
+ 
+-  /* This will be set if we found a candidate chunk.  */
+-  victim = NULL;
+-
+-  /* Fast bins are singly-linked, hard to remove a chunk from the middle
+-     and unlikely to meet our alignment requirements.  We have not done
+-     any experimentation with searching for aligned fastbins.  */
+-
+-  if (av != NULL)
+-    {
+-      int first_bin_index;
+-      int first_largebin_index;
+-      int last_bin_index;
+-
+-      if (in_smallbin_range (nb))
+-	first_bin_index = smallbin_index (nb);
+-      else
+-	first_bin_index = largebin_index (nb);
+-
+-      if (in_smallbin_range (nb * 2))
+-	last_bin_index = smallbin_index (nb * 2);
+-      else
+-	last_bin_index = largebin_index (nb * 2);
+-
+-      first_largebin_index = largebin_index (MIN_LARGE_SIZE);
+-
+-      int victim_index;                 /* its bin index */
+-
+-      for (victim_index = first_bin_index;
+-	   victim_index < last_bin_index;
+-	   victim_index ++)
+-	{
+-	  victim = NULL;
+-
+-	  if (victim_index < first_largebin_index)
+-	    {
+-	      /* Check small bins.  Small bin chunks are doubly-linked despite
+-		 being the same size.  */
+-
+-	      mchunkptr fwd;                    /* misc temp for linking */
+-	      mchunkptr bck;                    /* misc temp for linking */
+-
+-	      bck = bin_at (av, victim_index);
+-	      fwd = bck->fd;
+-	      while (fwd != bck)
+-		{
+-		  if (chunk_ok_for_memalign (fwd, alignment, nb) > 0)
+-		    {
+-		      victim = fwd;
+-
+-		      /* Unlink it */
+-		      victim->fd->bk = victim->bk;
+-		      victim->bk->fd = victim->fd;
+-		      break;
+-		    }
+-
+-		  fwd = fwd->fd;
+-		}
+-	    }
+-	  else
+-	    {
+-	      /* Check large bins.  */
+-	      mchunkptr fwd;                    /* misc temp for linking */
+-	      mchunkptr bck;                    /* misc temp for linking */
+-	      mchunkptr best = NULL;
+-	      size_t best_size = 0;
+-
+-	      bck = bin_at (av, victim_index);
+-	      fwd = bck->fd;
++  /* Call malloc with worst case padding to hit alignment. */
++  m = (char *) (_int_malloc (av, nb + alignment + MINSIZE));
+ 
+-	      while (fwd != bck)
+-		{
+-		  int extra;
+-
+-		  if (chunksize (fwd) < nb)
+-		    break;
+-		  extra = chunk_ok_for_memalign (fwd, alignment, nb);
+-		  if (extra > 0
+-		      && (extra <= best_size || best == NULL))
+-		    {
+-		      best = fwd;
+-		      best_size = extra;
+-		    }
++  if (m == 0)
++    return 0;           /* propagate failure */
+ 
+-		  fwd = fwd->fd;
+-		}
+-	      victim = best;
+-
+-	      if (victim != NULL)
+-		{
+-		  unlink_chunk (av, victim);
+-		  break;
+-		}
+-	    }
+-
+-	  if (victim != NULL)
+-	    break;
+-	}
+-    }
+-
+-  /* Strategy: find a spot within that chunk that meets the alignment
+-     request, and then possibly free the leading and trailing space.
+-     This strategy is incredibly costly and can lead to external
+-     fragmentation if header and footer chunks are unused.  */
+-
+-  if (victim != NULL)
+-    {
+-      p = victim;
+-      m = chunk2mem (p);
+-      set_inuse (p);
+-      if (av != &main_arena)
+-	set_non_main_arena (p);
+-    }
+-  else
+-    {
+-      /* Call malloc with worst case padding to hit alignment. */
+-
+-      m = (char *) (_int_malloc (av, nb + alignment + MINSIZE));
+-
+-      if (m == 0)
+-	return 0;           /* propagate failure */
+-
+-      p = mem2chunk (m);
+-    }
++  p = mem2chunk (m);
+ 
+   if ((((unsigned long) (m)) % alignment) != 0)   /* misaligned */
+     {
+diff --git a/malloc/tst-memalign-2.c b/malloc/tst-memalign-2.c
+index f229283dbf..ecd6fa249e 100644
+--- a/malloc/tst-memalign-2.c
++++ b/malloc/tst-memalign-2.c
+@@ -86,7 +86,8 @@ do_test (void)
+       TEST_VERIFY (tcache_allocs[i].ptr1 == tcache_allocs[i].ptr2);
+     }
+ 
+-  /* Test for non-head tcache hits.  */
++  /* Test for non-head tcache hits.  This exercises the memalign
++     scanning code to find matching allocations.  */
+   for (i = 0; i < array_length (ptr); ++ i)
+     {
+       if (i == 4)
+@@ -113,7 +114,9 @@ do_test (void)
+   free (p);
+   TEST_VERIFY (count > 0);
+ 
+-  /* Large bins test.  */
++  /* Large bins test.  This verifies that the over-allocated parts
++     that memalign releases for future allocations can be reused by
++     memalign itself at least in some cases.  */
+ 
+   for (i = 0; i < LN; ++ i)
+     {
diff --git a/srcpkgs/glibc/template b/srcpkgs/glibc/template
index 20805fb52e816..cf7cd073a9500 100644
--- a/srcpkgs/glibc/template
+++ b/srcpkgs/glibc/template
@@ -1,7 +1,7 @@
 # Template file for 'glibc'
 pkgname=glibc
 version=2.38
-revision=1
+revision=2
 bootstrap=yes
 short_desc="GNU C library"
 maintainer="Enno Boland <gottox@voidlinux.org>"

From ff904ab744dd04a5c17083a70272e2ff86e453f8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:07 -0300
Subject: [PATCH 2/9] cross-aarch64-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-aarch64-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-aarch64-linux-gnu/template b/srcpkgs/cross-aarch64-linux-gnu/template
index 690e27b8adc15..91dadf4b01179 100644
--- a/srcpkgs/cross-aarch64-linux-gnu/template
+++ b/srcpkgs/cross-aarch64-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-aarch64-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--with-arch=armv8-a"
 hostmakedepends="texinfo tar gcc-objc gcc-go flex perl python3 pkg-config"

From 548ba21a512a08be25773095fc183be435f967fd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:07 -0300
Subject: [PATCH 3/9] cross-i686-pc-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-i686-pc-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-i686-pc-linux-gnu/template b/srcpkgs/cross-i686-pc-linux-gnu/template
index c6b5319ac5d78..c695fb2f37a20 100644
--- a/srcpkgs/cross-i686-pc-linux-gnu/template
+++ b/srcpkgs/cross-i686-pc-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-i686-pc-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 hostmakedepends="texinfo tar gcc-objc gcc-go flex perl python3 pkg-config"
 makedepends="isl-devel libmpc-devel gmp-devel mpfr-devel

From f72ee165065f2ae7b744bf85244ad372299030ba Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:07 -0300
Subject: [PATCH 4/9] cross-powerpc-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-powerpc-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-powerpc-linux-gnu/template b/srcpkgs/cross-powerpc-linux-gnu/template
index 31578760f17e5..337d1c652ab91 100644
--- a/srcpkgs/cross-powerpc-linux-gnu/template
+++ b/srcpkgs/cross-powerpc-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-powerpc-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--enable-secureplt --disable-vtable-verify
  --enable-autolink-libatomic"

From 581c69600ea2b4f21fbe69c735d17fca9b9c8dd4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:07 -0300
Subject: [PATCH 5/9] cross-powerpc64-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-powerpc64-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-powerpc64-linux-gnu/template b/srcpkgs/cross-powerpc64-linux-gnu/template
index cdbd1e26f725d..f4af015668699 100644
--- a/srcpkgs/cross-powerpc64-linux-gnu/template
+++ b/srcpkgs/cross-powerpc64-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-powerpc64-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--enable-secureplt --disable-vtable-verify --with-abi=elfv2
  --enable-targets=powerpc-linux --enable-autolink-libatomic"

From 97ed7c4299283cafada2d5967b12fcf669ff3baa Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:07 -0300
Subject: [PATCH 6/9] cross-powerpc64le-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-powerpc64le-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-powerpc64le-linux-gnu/template b/srcpkgs/cross-powerpc64le-linux-gnu/template
index 82dc196a247d5..6181e0e4d236d 100644
--- a/srcpkgs/cross-powerpc64le-linux-gnu/template
+++ b/srcpkgs/cross-powerpc64le-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-powerpc64le-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--enable-secureplt --disable-vtable-verify --with-abi=elfv2
  --enable-targets=powerpcle-linux --enable-autolink-libatomic"

From c091976bf36fce2e3ada23e2b1af5e5f7d56ed70 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:08 -0300
Subject: [PATCH 7/9] cross-powerpcle-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-powerpcle-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-powerpcle-linux-gnu/template b/srcpkgs/cross-powerpcle-linux-gnu/template
index 7576278738d95..720cf97873314 100644
--- a/srcpkgs/cross-powerpcle-linux-gnu/template
+++ b/srcpkgs/cross-powerpcle-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-powerpcle-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--enable-secureplt --disable-vtable-verify
  --enable-autolink-libatomic"

From 9d4e85a7467ad1918b5bc7b652c6c0690c93e080 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:08 -0300
Subject: [PATCH 8/9] cross-riscv64-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-riscv64-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-riscv64-linux-gnu/template b/srcpkgs/cross-riscv64-linux-gnu/template
index 5cd6da69c32e8..bd1a010d98269 100644
--- a/srcpkgs/cross-riscv64-linux-gnu/template
+++ b/srcpkgs/cross-riscv64-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-riscv64-linux-gnu
 version=0.35
-revision=4
+revision=5
 build_style=void-cross
 configure_args="--with-arch=rv64gc --with-abi=lp64d --enable-autolink-libatomic --disable-multilib"
 hostmakedepends="texinfo tar gcc-objc gcc-go flex perl python3 pkg-config"

From 40bba1e91878a27e9ac162391754017fd5e4319f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:08 -0300
Subject: [PATCH 9/9] cross-x86_64-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-x86_64-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-x86_64-linux-gnu/template b/srcpkgs/cross-x86_64-linux-gnu/template
index 7eabe1625b23e..b7c2bb3f04fd7 100644
--- a/srcpkgs/cross-x86_64-linux-gnu/template
+++ b/srcpkgs/cross-x86_64-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-x86_64-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 hostmakedepends="texinfo tar gcc-objc gcc-go flex perl python3 pkg-config"
 makedepends="isl-devel libmpc-devel gmp-devel mpfr-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PR PATCH] [Updated] [ci skip] glibc: fix memalign performance regression
  2023-12-26 22:40 [PR PATCH] glibc: fix memalign performance regression tornaria
  2023-12-26 22:52 ` [PR PATCH] [Updated] " tornaria
@ 2023-12-26 23:23 ` tornaria
  2023-12-27  0:17 ` oreo639
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: tornaria @ 2023-12-26 23:23 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 1951 bytes --]

There is an updated pull request by tornaria against master on the void-packages repository

https://github.com/tornaria/void-packages glibc
https://github.com/void-linux/void-packages/pull/47914

[ci skip] glibc: fix memalign performance regression
<!-- Uncomment relevant sections and delete options which are not applicable -->

The upgrade to 2.38 brought a very sad performance regression in sagemath:
```
$ time python -c 'from sage.graphs.generators.distance_regular import DoubleGrassmannGraph; print(DoubleGrassmannGraph(2,2))'
<string>:1: UserWarning: Resolving lazy import GF during startup
<string>:1: UserWarning: Resolving lazy import VectorSpace during startup
Double Grassmann graph (5, 2, 2)

real	0m30.101s
user	0m29.959s
sys	0m0.060s
```
while the same thing in 2.36 (or after this PR) takes ~ 1-2 seconds.

Thanks to @oreo639 for figuring out it was https://sourceware.org/bugzilla/show_bug.cgi?id=30723

Indeed, all the performance regressions I was seeing are gone now.

#### Testing the changes
- I tested the changes in this PR: **briefly**

<!--
#### New package
- This new package conforms to the [package requirements](https://github.com/void-linux/void-packages/blob/master/CONTRIBUTING.md#package-requirements): **YES**|**NO**
-->

<!-- Note: If the build is likely to take more than 2 hours, please add ci skip tag as described in
https://github.com/void-linux/void-packages/blob/master/CONTRIBUTING.md#continuous-integration
and test at least one native build and, if supported, at least one cross build.
Ignore this section if this PR is not skipping CI.
-->
<!--
#### Local build testing
- I built this PR locally for my native architecture, (ARCH-LIBC)
- I built this PR locally for these architectures (if supported. mark crossbuilds):
  - aarch64-musl
  - armv7l
  - armv6l-musl
-->


A patch file from https://github.com/void-linux/void-packages/pull/47914.patch is attached

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: github-pr-glibc-47914.patch --]
[-- Type: text/x-diff, Size: 31053 bytes --]

From f957c22118b590ccf808bf95f0608a6e27163a02 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:28:39 -0300
Subject: [PATCH 01/12] glibc: fix performance regression in posix_memalign
 with long free lists

Thanks @oreo639
---
 ...1105852568c3ebc712225ae78b8c8ba31a78.patch | 296 ++++++++++++++++++
 ...fc1cf094406a138e4d1bcf9553e59edcf89d.patch | 252 +++++++++++++++
 srcpkgs/glibc/template                        |   2 +-
 3 files changed, 549 insertions(+), 1 deletion(-)
 create mode 100644 srcpkgs/glibc/patches/30723-1-542b1105852568c3ebc712225ae78b8c8ba31a78.patch
 create mode 100644 srcpkgs/glibc/patches/30723-2-0dc7fc1cf094406a138e4d1bcf9553e59edcf89d.patch

diff --git a/srcpkgs/glibc/patches/30723-1-542b1105852568c3ebc712225ae78b8c8ba31a78.patch b/srcpkgs/glibc/patches/30723-1-542b1105852568c3ebc712225ae78b8c8ba31a78.patch
new file mode 100644
index 0000000000000..56d5d47c031a0
--- /dev/null
+++ b/srcpkgs/glibc/patches/30723-1-542b1105852568c3ebc712225ae78b8c8ba31a78.patch
@@ -0,0 +1,296 @@
+From 542b1105852568c3ebc712225ae78b8c8ba31a78 Mon Sep 17 00:00:00 2001
+From: Florian Weimer <fweimer@redhat.com>
+Date: Fri, 11 Aug 2023 11:18:17 +0200
+Subject: [PATCH] malloc: Enable merging of remainders in memalign (bug 30723)
+
+Previously, calling _int_free from _int_memalign could put remainders
+into the tcache or into fastbins, where they are invisible to the
+low-level allocator.  This results in missed merge opportunities
+because once these freed chunks become available to the low-level
+allocator, further memalign allocations (even of the same size are)
+likely obstructing merges.
+
+Furthermore, during forwards merging in _int_memalign, do not
+completely give up when the remainder is too small to serve as a
+chunk on its own.  We can still give it back if it can be merged
+with the following unused chunk.  This makes it more likely that
+memalign calls in a loop achieve a compact memory layout,
+independently of initial heap layout.
+
+Drop some useless (unsigned long) casts along the way, and tweak
+the style to more closely match GNU on changed lines.
+
+Reviewed-by: DJ Delorie <dj@redhat.com>
+---
+ malloc/malloc.c | 197 +++++++++++++++++++++++++++++-------------------
+ 1 file changed, 121 insertions(+), 76 deletions(-)
+
+diff --git a/malloc/malloc.c b/malloc/malloc.c
+index e2f1a615a4..948f9759af 100644
+--- a/malloc/malloc.c
++++ b/malloc/malloc.c
+@@ -1086,6 +1086,11 @@ typedef struct malloc_chunk* mchunkptr;
+ 
+ static void*  _int_malloc(mstate, size_t);
+ static void     _int_free(mstate, mchunkptr, int);
++static void _int_free_merge_chunk (mstate, mchunkptr, INTERNAL_SIZE_T);
++static INTERNAL_SIZE_T _int_free_create_chunk (mstate,
++					       mchunkptr, INTERNAL_SIZE_T,
++					       mchunkptr, INTERNAL_SIZE_T);
++static void _int_free_maybe_consolidate (mstate, INTERNAL_SIZE_T);
+ static void*  _int_realloc(mstate, mchunkptr, INTERNAL_SIZE_T,
+ 			   INTERNAL_SIZE_T);
+ static void*  _int_memalign(mstate, size_t, size_t);
+@@ -4637,31 +4642,52 @@ _int_free (mstate av, mchunkptr p, int have_lock)
+     if (!have_lock)
+       __libc_lock_lock (av->mutex);
+ 
+-    nextchunk = chunk_at_offset(p, size);
+-
+-    /* Lightweight tests: check whether the block is already the
+-       top block.  */
+-    if (__glibc_unlikely (p == av->top))
+-      malloc_printerr ("double free or corruption (top)");
+-    /* Or whether the next chunk is beyond the boundaries of the arena.  */
+-    if (__builtin_expect (contiguous (av)
+-			  && (char *) nextchunk
+-			  >= ((char *) av->top + chunksize(av->top)), 0))
+-	malloc_printerr ("double free or corruption (out)");
+-    /* Or whether the block is actually not marked used.  */
+-    if (__glibc_unlikely (!prev_inuse(nextchunk)))
+-      malloc_printerr ("double free or corruption (!prev)");
+-
+-    nextsize = chunksize(nextchunk);
+-    if (__builtin_expect (chunksize_nomask (nextchunk) <= CHUNK_HDR_SZ, 0)
+-	|| __builtin_expect (nextsize >= av->system_mem, 0))
+-      malloc_printerr ("free(): invalid next size (normal)");
++    _int_free_merge_chunk (av, p, size);
+ 
+-    free_perturb (chunk2mem(p), size - CHUNK_HDR_SZ);
++    if (!have_lock)
++      __libc_lock_unlock (av->mutex);
++  }
++  /*
++    If the chunk was allocated via mmap, release via munmap().
++  */
++
++  else {
++    munmap_chunk (p);
++  }
++}
++
++/* Try to merge chunk P of SIZE bytes with its neighbors.  Put the
++   resulting chunk on the appropriate bin list.  P must not be on a
++   bin list yet, and it can be in use.  */
++static void
++_int_free_merge_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size)
++{
++  mchunkptr nextchunk = chunk_at_offset(p, size);
++
++  /* Lightweight tests: check whether the block is already the
++     top block.  */
++  if (__glibc_unlikely (p == av->top))
++    malloc_printerr ("double free or corruption (top)");
++  /* Or whether the next chunk is beyond the boundaries of the arena.  */
++  if (__builtin_expect (contiguous (av)
++			&& (char *) nextchunk
++			>= ((char *) av->top + chunksize(av->top)), 0))
++    malloc_printerr ("double free or corruption (out)");
++  /* Or whether the block is actually not marked used.  */
++  if (__glibc_unlikely (!prev_inuse(nextchunk)))
++    malloc_printerr ("double free or corruption (!prev)");
++
++  INTERNAL_SIZE_T nextsize = chunksize(nextchunk);
++  if (__builtin_expect (chunksize_nomask (nextchunk) <= CHUNK_HDR_SZ, 0)
++      || __builtin_expect (nextsize >= av->system_mem, 0))
++    malloc_printerr ("free(): invalid next size (normal)");
++
++  free_perturb (chunk2mem(p), size - CHUNK_HDR_SZ);
+ 
+-    /* consolidate backward */
+-    if (!prev_inuse(p)) {
+-      prevsize = prev_size (p);
++  /* Consolidate backward.  */
++  if (!prev_inuse(p))
++    {
++      INTERNAL_SIZE_T prevsize = prev_size (p);
+       size += prevsize;
+       p = chunk_at_offset(p, -((long) prevsize));
+       if (__glibc_unlikely (chunksize(p) != prevsize))
+@@ -4669,9 +4695,25 @@ _int_free (mstate av, mchunkptr p, int have_lock)
+       unlink_chunk (av, p);
+     }
+ 
+-    if (nextchunk != av->top) {
++  /* Write the chunk header, maybe after merging with the following chunk.  */
++  size = _int_free_create_chunk (av, p, size, nextchunk, nextsize);
++  _int_free_maybe_consolidate (av, size);
++}
++
++/* Create a chunk at P of SIZE bytes, with SIZE potentially increased
++   to cover the immediately following chunk NEXTCHUNK of NEXTSIZE
++   bytes (if NEXTCHUNK is unused).  The chunk at P is not actually
++   read and does not have to be initialized.  After creation, it is
++   placed on the appropriate bin list.  The function returns the size
++   of the new chunk.  */
++static INTERNAL_SIZE_T
++_int_free_create_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size,
++			mchunkptr nextchunk, INTERNAL_SIZE_T nextsize)
++{
++  if (nextchunk != av->top)
++    {
+       /* get and clear inuse bit */
+-      nextinuse = inuse_bit_at_offset(nextchunk, nextsize);
++      bool nextinuse = inuse_bit_at_offset (nextchunk, nextsize);
+ 
+       /* consolidate forward */
+       if (!nextinuse) {
+@@ -4686,8 +4728,8 @@ _int_free (mstate av, mchunkptr p, int have_lock)
+ 	been given one chance to be used in malloc.
+       */
+ 
+-      bck = unsorted_chunks(av);
+-      fwd = bck->fd;
++      mchunkptr bck = unsorted_chunks (av);
++      mchunkptr fwd = bck->fd;
+       if (__glibc_unlikely (fwd->bk != bck))
+ 	malloc_printerr ("free(): corrupted unsorted chunks");
+       p->fd = fwd;
+@@ -4706,61 +4748,52 @@ _int_free (mstate av, mchunkptr p, int have_lock)
+       check_free_chunk(av, p);
+     }
+ 
+-    /*
+-      If the chunk borders the current high end of memory,
+-      consolidate into top
+-    */
+-
+-    else {
++  else
++    {
++      /* If the chunk borders the current high end of memory,
++	 consolidate into top.  */
+       size += nextsize;
+       set_head(p, size | PREV_INUSE);
+       av->top = p;
+       check_chunk(av, p);
+     }
+ 
+-    /*
+-      If freeing a large space, consolidate possibly-surrounding
+-      chunks. Then, if the total unused topmost memory exceeds trim
+-      threshold, ask malloc_trim to reduce top.
+-
+-      Unless max_fast is 0, we don't know if there are fastbins
+-      bordering top, so we cannot tell for sure whether threshold
+-      has been reached unless fastbins are consolidated.  But we
+-      don't want to consolidate on each free.  As a compromise,
+-      consolidation is performed if FASTBIN_CONSOLIDATION_THRESHOLD
+-      is reached.
+-    */
++  return size;
++}
+ 
+-    if ((unsigned long)(size) >= FASTBIN_CONSOLIDATION_THRESHOLD) {
++/* If freeing a large space, consolidate possibly-surrounding
++   chunks.  Then, if the total unused topmost memory exceeds trim
++   threshold, ask malloc_trim to reduce top.  */
++static void
++_int_free_maybe_consolidate (mstate av, INTERNAL_SIZE_T size)
++{
++  /* Unless max_fast is 0, we don't know if there are fastbins
++     bordering top, so we cannot tell for sure whether threshold has
++     been reached unless fastbins are consolidated.  But we don't want
++     to consolidate on each free.  As a compromise, consolidation is
++     performed if FASTBIN_CONSOLIDATION_THRESHOLD is reached.  */
++  if (size >= FASTBIN_CONSOLIDATION_THRESHOLD)
++    {
+       if (atomic_load_relaxed (&av->have_fastchunks))
+ 	malloc_consolidate(av);
+ 
+-      if (av == &main_arena) {
++      if (av == &main_arena)
++	{
+ #ifndef MORECORE_CANNOT_TRIM
+-	if ((unsigned long)(chunksize(av->top)) >=
+-	    (unsigned long)(mp_.trim_threshold))
+-	  systrim(mp_.top_pad, av);
++	  if (chunksize (av->top) >= mp_.trim_threshold)
++	    systrim (mp_.top_pad, av);
+ #endif
+-      } else {
+-	/* Always try heap_trim(), even if the top chunk is not
+-	   large, because the corresponding heap might go away.  */
+-	heap_info *heap = heap_for_ptr(top(av));
++	}
++      else
++	{
++	  /* Always try heap_trim, even if the top chunk is not large,
++	     because the corresponding heap might go away.  */
++	  heap_info *heap = heap_for_ptr (top (av));
+ 
+-	assert(heap->ar_ptr == av);
+-	heap_trim(heap, mp_.top_pad);
+-      }
++	  assert (heap->ar_ptr == av);
++	  heap_trim (heap, mp_.top_pad);
++	}
+     }
+-
+-    if (!have_lock)
+-      __libc_lock_unlock (av->mutex);
+-  }
+-  /*
+-    If the chunk was allocated via mmap, release via munmap().
+-  */
+-
+-  else {
+-    munmap_chunk (p);
+-  }
+ }
+ 
+ /*
+@@ -5221,7 +5254,7 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
+                 (av != &main_arena ? NON_MAIN_ARENA : 0));
+       set_inuse_bit_at_offset (newp, newsize);
+       set_head_size (p, leadsize | (av != &main_arena ? NON_MAIN_ARENA : 0));
+-      _int_free (av, p, 1);
++      _int_free_merge_chunk (av, p, leadsize);
+       p = newp;
+ 
+       assert (newsize >= nb &&
+@@ -5232,15 +5265,27 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
+   if (!chunk_is_mmapped (p))
+     {
+       size = chunksize (p);
+-      if ((unsigned long) (size) > (unsigned long) (nb + MINSIZE))
++      mchunkptr nextchunk = chunk_at_offset(p, size);
++      INTERNAL_SIZE_T nextsize = chunksize(nextchunk);
++      if (size > nb)
+         {
+           remainder_size = size - nb;
+-          remainder = chunk_at_offset (p, nb);
+-          set_head (remainder, remainder_size | PREV_INUSE |
+-                    (av != &main_arena ? NON_MAIN_ARENA : 0));
+-          set_head_size (p, nb);
+-          _int_free (av, remainder, 1);
+-        }
++	  if (remainder_size >= MINSIZE
++	      || nextchunk == av->top
++	      || !inuse_bit_at_offset (nextchunk, nextsize))
++	    {
++	      /* We can only give back the tail if it is larger than
++		 MINSIZE, or if the following chunk is unused (top
++		 chunk or unused in-heap chunk).  Otherwise we would
++		 create a chunk that is smaller than MINSIZE.  */
++	      remainder = chunk_at_offset (p, nb);
++	      set_head_size (p, nb);
++	      remainder_size = _int_free_create_chunk (av, remainder,
++						       remainder_size,
++						       nextchunk, nextsize);
++	      _int_free_maybe_consolidate (av, remainder_size);
++	    }
++	}
+     }
+ 
+   check_inuse_chunk (av, p);
diff --git a/srcpkgs/glibc/patches/30723-2-0dc7fc1cf094406a138e4d1bcf9553e59edcf89d.patch b/srcpkgs/glibc/patches/30723-2-0dc7fc1cf094406a138e4d1bcf9553e59edcf89d.patch
new file mode 100644
index 0000000000000..4615c7b035cc7
--- /dev/null
+++ b/srcpkgs/glibc/patches/30723-2-0dc7fc1cf094406a138e4d1bcf9553e59edcf89d.patch
@@ -0,0 +1,252 @@
+From 0dc7fc1cf094406a138e4d1bcf9553e59edcf89d Mon Sep 17 00:00:00 2001
+From: Florian Weimer <fweimer@redhat.com>
+Date: Thu, 10 Aug 2023 19:36:56 +0200
+Subject: [PATCH] malloc: Remove bin scanning from memalign (bug 30723)
+
+On the test workload (mpv --cache=yes with VP9 video decoding), the
+bin scanning has a very poor success rate (less than 2%).  The tcache
+scanning has about 50% success rate, so keep that.
+
+Update comments in malloc/tst-memalign-2 to indicate the purpose
+of the tests.  Even with the scanning removed, the additional
+merging opportunities since commit 542b1105852568c3ebc712225ae78b
+("malloc: Enable merging of remainders in memalign (bug 30723)")
+are sufficient to pass the existing large bins test.
+
+Remove leftover variables from _int_free from refactoring in the
+same commit.
+
+Reviewed-by: DJ Delorie <dj@redhat.com>
+---
+ malloc/malloc.c         | 169 ++--------------------------------------
+ malloc/tst-memalign-2.c |   7 +-
+ 2 files changed, 10 insertions(+), 166 deletions(-)
+
+diff --git a/malloc/malloc.c b/malloc/malloc.c
+index 948f9759af..d0bbbf3710 100644
+--- a/malloc/malloc.c
++++ b/malloc/malloc.c
+@@ -4488,12 +4488,6 @@ _int_free (mstate av, mchunkptr p, int have_lock)
+ {
+   INTERNAL_SIZE_T size;        /* its size */
+   mfastbinptr *fb;             /* associated fastbin */
+-  mchunkptr nextchunk;         /* next contiguous chunk */
+-  INTERNAL_SIZE_T nextsize;    /* its size */
+-  int nextinuse;               /* true if nextchunk is used */
+-  INTERNAL_SIZE_T prevsize;    /* size of previous contiguous chunk */
+-  mchunkptr bck;               /* misc temp for linking */
+-  mchunkptr fwd;               /* misc temp for linking */
+ 
+   size = chunksize (p);
+ 
+@@ -5032,42 +5026,6 @@ _int_realloc (mstate av, mchunkptr oldp, INTERNAL_SIZE_T oldsize,
+    ------------------------------ memalign ------------------------------
+  */
+ 
+-/* Returns 0 if the chunk is not and does not contain the requested
+-   aligned sub-chunk, else returns the amount of "waste" from
+-   trimming.  NB is the *chunk* byte size, not the user byte
+-   size.  */
+-static size_t
+-chunk_ok_for_memalign (mchunkptr p, size_t alignment, size_t nb)
+-{
+-  void *m = chunk2mem (p);
+-  INTERNAL_SIZE_T size = chunksize (p);
+-  void *aligned_m = m;
+-
+-  if (__glibc_unlikely (misaligned_chunk (p)))
+-    malloc_printerr ("_int_memalign(): unaligned chunk detected");
+-
+-  aligned_m = PTR_ALIGN_UP (m, alignment);
+-
+-  INTERNAL_SIZE_T front_extra = (intptr_t) aligned_m - (intptr_t) m;
+-
+-  /* We can't trim off the front as it's too small.  */
+-  if (front_extra > 0 && front_extra < MINSIZE)
+-    return 0;
+-
+-  /* If it's a perfect fit, it's an exception to the return value rule
+-     (we would return zero waste, which looks like "not usable"), so
+-     handle it here by returning a small non-zero value instead.  */
+-  if (size == nb && front_extra == 0)
+-    return 1;
+-
+-  /* If the block we need fits in the chunk, calculate total waste.  */
+-  if (size > nb + front_extra)
+-    return size - nb;
+-
+-  /* Can't use this chunk.  */
+-  return 0;
+-}
+-
+ /* BYTES is user requested bytes, not requested chunksize bytes.  */
+ static void *
+ _int_memalign (mstate av, size_t alignment, size_t bytes)
+@@ -5082,7 +5040,6 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
+   mchunkptr remainder;            /* spare room at end to split off */
+   unsigned long remainder_size;   /* its size */
+   INTERNAL_SIZE_T size;
+-  mchunkptr victim;
+ 
+   nb = checked_request2size (bytes);
+   if (nb == 0)
+@@ -5101,129 +5058,13 @@ _int_memalign (mstate av, size_t alignment, size_t bytes)
+      we don't find anything in those bins, the common malloc code will
+      scan starting at 2x.  */
+ 
+-  /* This will be set if we found a candidate chunk.  */
+-  victim = NULL;
+-
+-  /* Fast bins are singly-linked, hard to remove a chunk from the middle
+-     and unlikely to meet our alignment requirements.  We have not done
+-     any experimentation with searching for aligned fastbins.  */
+-
+-  if (av != NULL)
+-    {
+-      int first_bin_index;
+-      int first_largebin_index;
+-      int last_bin_index;
+-
+-      if (in_smallbin_range (nb))
+-	first_bin_index = smallbin_index (nb);
+-      else
+-	first_bin_index = largebin_index (nb);
+-
+-      if (in_smallbin_range (nb * 2))
+-	last_bin_index = smallbin_index (nb * 2);
+-      else
+-	last_bin_index = largebin_index (nb * 2);
+-
+-      first_largebin_index = largebin_index (MIN_LARGE_SIZE);
+-
+-      int victim_index;                 /* its bin index */
+-
+-      for (victim_index = first_bin_index;
+-	   victim_index < last_bin_index;
+-	   victim_index ++)
+-	{
+-	  victim = NULL;
+-
+-	  if (victim_index < first_largebin_index)
+-	    {
+-	      /* Check small bins.  Small bin chunks are doubly-linked despite
+-		 being the same size.  */
+-
+-	      mchunkptr fwd;                    /* misc temp for linking */
+-	      mchunkptr bck;                    /* misc temp for linking */
+-
+-	      bck = bin_at (av, victim_index);
+-	      fwd = bck->fd;
+-	      while (fwd != bck)
+-		{
+-		  if (chunk_ok_for_memalign (fwd, alignment, nb) > 0)
+-		    {
+-		      victim = fwd;
+-
+-		      /* Unlink it */
+-		      victim->fd->bk = victim->bk;
+-		      victim->bk->fd = victim->fd;
+-		      break;
+-		    }
+-
+-		  fwd = fwd->fd;
+-		}
+-	    }
+-	  else
+-	    {
+-	      /* Check large bins.  */
+-	      mchunkptr fwd;                    /* misc temp for linking */
+-	      mchunkptr bck;                    /* misc temp for linking */
+-	      mchunkptr best = NULL;
+-	      size_t best_size = 0;
+-
+-	      bck = bin_at (av, victim_index);
+-	      fwd = bck->fd;
++  /* Call malloc with worst case padding to hit alignment. */
++  m = (char *) (_int_malloc (av, nb + alignment + MINSIZE));
+ 
+-	      while (fwd != bck)
+-		{
+-		  int extra;
+-
+-		  if (chunksize (fwd) < nb)
+-		    break;
+-		  extra = chunk_ok_for_memalign (fwd, alignment, nb);
+-		  if (extra > 0
+-		      && (extra <= best_size || best == NULL))
+-		    {
+-		      best = fwd;
+-		      best_size = extra;
+-		    }
++  if (m == 0)
++    return 0;           /* propagate failure */
+ 
+-		  fwd = fwd->fd;
+-		}
+-	      victim = best;
+-
+-	      if (victim != NULL)
+-		{
+-		  unlink_chunk (av, victim);
+-		  break;
+-		}
+-	    }
+-
+-	  if (victim != NULL)
+-	    break;
+-	}
+-    }
+-
+-  /* Strategy: find a spot within that chunk that meets the alignment
+-     request, and then possibly free the leading and trailing space.
+-     This strategy is incredibly costly and can lead to external
+-     fragmentation if header and footer chunks are unused.  */
+-
+-  if (victim != NULL)
+-    {
+-      p = victim;
+-      m = chunk2mem (p);
+-      set_inuse (p);
+-      if (av != &main_arena)
+-	set_non_main_arena (p);
+-    }
+-  else
+-    {
+-      /* Call malloc with worst case padding to hit alignment. */
+-
+-      m = (char *) (_int_malloc (av, nb + alignment + MINSIZE));
+-
+-      if (m == 0)
+-	return 0;           /* propagate failure */
+-
+-      p = mem2chunk (m);
+-    }
++  p = mem2chunk (m);
+ 
+   if ((((unsigned long) (m)) % alignment) != 0)   /* misaligned */
+     {
+diff --git a/malloc/tst-memalign-2.c b/malloc/tst-memalign-2.c
+index f229283dbf..ecd6fa249e 100644
+--- a/malloc/tst-memalign-2.c
++++ b/malloc/tst-memalign-2.c
+@@ -86,7 +86,8 @@ do_test (void)
+       TEST_VERIFY (tcache_allocs[i].ptr1 == tcache_allocs[i].ptr2);
+     }
+ 
+-  /* Test for non-head tcache hits.  */
++  /* Test for non-head tcache hits.  This exercises the memalign
++     scanning code to find matching allocations.  */
+   for (i = 0; i < array_length (ptr); ++ i)
+     {
+       if (i == 4)
+@@ -113,7 +114,9 @@ do_test (void)
+   free (p);
+   TEST_VERIFY (count > 0);
+ 
+-  /* Large bins test.  */
++  /* Large bins test.  This verifies that the over-allocated parts
++     that memalign releases for future allocations can be reused by
++     memalign itself at least in some cases.  */
+ 
+   for (i = 0; i < LN; ++ i)
+     {
diff --git a/srcpkgs/glibc/template b/srcpkgs/glibc/template
index 20805fb52e816..cf7cd073a9500 100644
--- a/srcpkgs/glibc/template
+++ b/srcpkgs/glibc/template
@@ -1,7 +1,7 @@
 # Template file for 'glibc'
 pkgname=glibc
 version=2.38
-revision=1
+revision=2
 bootstrap=yes
 short_desc="GNU C library"
 maintainer="Enno Boland <gottox@voidlinux.org>"

From ff904ab744dd04a5c17083a70272e2ff86e453f8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:07 -0300
Subject: [PATCH 02/12] cross-aarch64-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-aarch64-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-aarch64-linux-gnu/template b/srcpkgs/cross-aarch64-linux-gnu/template
index 690e27b8adc15..91dadf4b01179 100644
--- a/srcpkgs/cross-aarch64-linux-gnu/template
+++ b/srcpkgs/cross-aarch64-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-aarch64-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--with-arch=armv8-a"
 hostmakedepends="texinfo tar gcc-objc gcc-go flex perl python3 pkg-config"

From 548ba21a512a08be25773095fc183be435f967fd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:07 -0300
Subject: [PATCH 03/12] cross-i686-pc-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-i686-pc-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-i686-pc-linux-gnu/template b/srcpkgs/cross-i686-pc-linux-gnu/template
index c6b5319ac5d78..c695fb2f37a20 100644
--- a/srcpkgs/cross-i686-pc-linux-gnu/template
+++ b/srcpkgs/cross-i686-pc-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-i686-pc-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 hostmakedepends="texinfo tar gcc-objc gcc-go flex perl python3 pkg-config"
 makedepends="isl-devel libmpc-devel gmp-devel mpfr-devel

From f72ee165065f2ae7b744bf85244ad372299030ba Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:07 -0300
Subject: [PATCH 04/12] cross-powerpc-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-powerpc-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-powerpc-linux-gnu/template b/srcpkgs/cross-powerpc-linux-gnu/template
index 31578760f17e5..337d1c652ab91 100644
--- a/srcpkgs/cross-powerpc-linux-gnu/template
+++ b/srcpkgs/cross-powerpc-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-powerpc-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--enable-secureplt --disable-vtable-verify
  --enable-autolink-libatomic"

From 581c69600ea2b4f21fbe69c735d17fca9b9c8dd4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:07 -0300
Subject: [PATCH 05/12] cross-powerpc64-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-powerpc64-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-powerpc64-linux-gnu/template b/srcpkgs/cross-powerpc64-linux-gnu/template
index cdbd1e26f725d..f4af015668699 100644
--- a/srcpkgs/cross-powerpc64-linux-gnu/template
+++ b/srcpkgs/cross-powerpc64-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-powerpc64-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--enable-secureplt --disable-vtable-verify --with-abi=elfv2
  --enable-targets=powerpc-linux --enable-autolink-libatomic"

From 97ed7c4299283cafada2d5967b12fcf669ff3baa Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:07 -0300
Subject: [PATCH 06/12] cross-powerpc64le-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-powerpc64le-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-powerpc64le-linux-gnu/template b/srcpkgs/cross-powerpc64le-linux-gnu/template
index 82dc196a247d5..6181e0e4d236d 100644
--- a/srcpkgs/cross-powerpc64le-linux-gnu/template
+++ b/srcpkgs/cross-powerpc64le-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-powerpc64le-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--enable-secureplt --disable-vtable-verify --with-abi=elfv2
  --enable-targets=powerpcle-linux --enable-autolink-libatomic"

From c091976bf36fce2e3ada23e2b1af5e5f7d56ed70 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:08 -0300
Subject: [PATCH 07/12] cross-powerpcle-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-powerpcle-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-powerpcle-linux-gnu/template b/srcpkgs/cross-powerpcle-linux-gnu/template
index 7576278738d95..720cf97873314 100644
--- a/srcpkgs/cross-powerpcle-linux-gnu/template
+++ b/srcpkgs/cross-powerpcle-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-powerpcle-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--enable-secureplt --disable-vtable-verify
  --enable-autolink-libatomic"

From 9d4e85a7467ad1918b5bc7b652c6c0690c93e080 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:08 -0300
Subject: [PATCH 08/12] cross-riscv64-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-riscv64-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-riscv64-linux-gnu/template b/srcpkgs/cross-riscv64-linux-gnu/template
index 5cd6da69c32e8..bd1a010d98269 100644
--- a/srcpkgs/cross-riscv64-linux-gnu/template
+++ b/srcpkgs/cross-riscv64-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-riscv64-linux-gnu
 version=0.35
-revision=4
+revision=5
 build_style=void-cross
 configure_args="--with-arch=rv64gc --with-abi=lp64d --enable-autolink-libatomic --disable-multilib"
 hostmakedepends="texinfo tar gcc-objc gcc-go flex perl python3 pkg-config"

From 40bba1e91878a27e9ac162391754017fd5e4319f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:08 -0300
Subject: [PATCH 09/12] cross-x86_64-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-x86_64-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-x86_64-linux-gnu/template b/srcpkgs/cross-x86_64-linux-gnu/template
index 7eabe1625b23e..b7c2bb3f04fd7 100644
--- a/srcpkgs/cross-x86_64-linux-gnu/template
+++ b/srcpkgs/cross-x86_64-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-x86_64-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 hostmakedepends="texinfo tar gcc-objc gcc-go flex perl python3 pkg-config"
 makedepends="isl-devel libmpc-devel gmp-devel mpfr-devel

From f8ed1ca48ac4a52e67d07697202aacbe1cf00543 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 20:22:30 -0300
Subject: [PATCH 10/12] cross-arm-linux-gnueabi: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-arm-linux-gnueabi/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-arm-linux-gnueabi/template b/srcpkgs/cross-arm-linux-gnueabi/template
index 7cf232218eedb..a9b157cb8375b 100644
--- a/srcpkgs/cross-arm-linux-gnueabi/template
+++ b/srcpkgs/cross-arm-linux-gnueabi/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-arm-linux-gnueabi
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--with-arch=armv5te --with-float=soft
  --enable-autolink-libatomic"

From a6fc619ed841615411ac0073a9fc5714ab9e7556 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 20:22:35 -0300
Subject: [PATCH 11/12] cross-arm-linux-gnueabihf: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-arm-linux-gnueabihf/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-arm-linux-gnueabihf/template b/srcpkgs/cross-arm-linux-gnueabihf/template
index 6cd9d50f7f459..637675a151af2 100644
--- a/srcpkgs/cross-arm-linux-gnueabihf/template
+++ b/srcpkgs/cross-arm-linux-gnueabihf/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-arm-linux-gnueabihf
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--with-arch=armv6 --with-fpu=vfp --with-float=hard
  --enable-autolink-libatomic"

From 5ab4198fe94d22bd00b7fb505b1ec2f905cb63f8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 20:22:35 -0300
Subject: [PATCH 12/12] cross-armv7l-linux-gnueabihf: rebuild to fix
 performance regression in posix_memalign

---
 srcpkgs/cross-armv7l-linux-gnueabihf/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-armv7l-linux-gnueabihf/template b/srcpkgs/cross-armv7l-linux-gnueabihf/template
index e50363fcb4f75..18d502a6a41c8 100644
--- a/srcpkgs/cross-armv7l-linux-gnueabihf/template
+++ b/srcpkgs/cross-armv7l-linux-gnueabihf/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-armv7l-linux-gnueabihf
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--with-arch=armv7-a --with-fpu=vfpv3 --with-float=hard"
 hostmakedepends="texinfo tar gcc-objc gcc-go flex perl python3 pkg-config"

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [ci skip] glibc: fix memalign performance regression
  2023-12-26 22:40 [PR PATCH] glibc: fix memalign performance regression tornaria
  2023-12-26 22:52 ` [PR PATCH] [Updated] " tornaria
  2023-12-26 23:23 ` [PR PATCH] [Updated] [ci skip] " tornaria
@ 2023-12-27  0:17 ` oreo639
  2023-12-27  1:58 ` [PR PATCH] [Updated] " sgn
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: oreo639 @ 2023-12-27  0:17 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 146 bytes --]

New comment by oreo639 on void-packages repository

https://github.com/void-linux/void-packages/pull/47914#issuecomment-1869836859

Comment:
Lgtm

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PR PATCH] [Updated] [ci skip] glibc: fix memalign performance regression
  2023-12-26 22:40 [PR PATCH] glibc: fix memalign performance regression tornaria
                   ` (2 preceding siblings ...)
  2023-12-27  0:17 ` oreo639
@ 2023-12-27  1:58 ` sgn
  2023-12-27  1:58 ` [PR PATCH] [Closed]: " sgn
  2023-12-27  8:04 ` [PR PATCH] [Merged]: " sgn
  5 siblings, 0 replies; 7+ messages in thread
From: sgn @ 2023-12-27  1:58 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 1946 bytes --]

There is an updated pull request by sgn against master on the void-packages repository

https://github.com/tornaria/void-packages glibc
https://github.com/void-linux/void-packages/pull/47914

[ci skip] glibc: fix memalign performance regression
<!-- Uncomment relevant sections and delete options which are not applicable -->

The upgrade to 2.38 brought a very sad performance regression in sagemath:
```
$ time python -c 'from sage.graphs.generators.distance_regular import DoubleGrassmannGraph; print(DoubleGrassmannGraph(2,2))'
<string>:1: UserWarning: Resolving lazy import GF during startup
<string>:1: UserWarning: Resolving lazy import VectorSpace during startup
Double Grassmann graph (5, 2, 2)

real	0m30.101s
user	0m29.959s
sys	0m0.060s
```
while the same thing in 2.36 (or after this PR) takes ~ 1-2 seconds.

Thanks to @oreo639 for figuring out it was https://sourceware.org/bugzilla/show_bug.cgi?id=30723

Indeed, all the performance regressions I was seeing are gone now.

#### Testing the changes
- I tested the changes in this PR: **briefly**

<!--
#### New package
- This new package conforms to the [package requirements](https://github.com/void-linux/void-packages/blob/master/CONTRIBUTING.md#package-requirements): **YES**|**NO**
-->

<!-- Note: If the build is likely to take more than 2 hours, please add ci skip tag as described in
https://github.com/void-linux/void-packages/blob/master/CONTRIBUTING.md#continuous-integration
and test at least one native build and, if supported, at least one cross build.
Ignore this section if this PR is not skipping CI.
-->
<!--
#### Local build testing
- I built this PR locally for my native architecture, (ARCH-LIBC)
- I built this PR locally for these architectures (if supported. mark crossbuilds):
  - aarch64-musl
  - armv7l
  - armv6l-musl
-->


A patch file from https://github.com/void-linux/void-packages/pull/47914.patch is attached

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: github-pr-glibc-47914.patch --]
[-- Type: text/x-diff, Size: 10639 bytes --]

From 28d3f61921244c4575467147201ec9f4b8948c9c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:07 -0300
Subject: [PATCH 01/12] cross-aarch64-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-aarch64-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-aarch64-linux-gnu/template b/srcpkgs/cross-aarch64-linux-gnu/template
index 690e27b8adc15..91dadf4b01179 100644
--- a/srcpkgs/cross-aarch64-linux-gnu/template
+++ b/srcpkgs/cross-aarch64-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-aarch64-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--with-arch=armv8-a"
 hostmakedepends="texinfo tar gcc-objc gcc-go flex perl python3 pkg-config"

From e6bdb72d0ea4685170bf8a480b41ffcde4c57658 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:07 -0300
Subject: [PATCH 02/12] cross-i686-pc-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-i686-pc-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-i686-pc-linux-gnu/template b/srcpkgs/cross-i686-pc-linux-gnu/template
index c6b5319ac5d78..c695fb2f37a20 100644
--- a/srcpkgs/cross-i686-pc-linux-gnu/template
+++ b/srcpkgs/cross-i686-pc-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-i686-pc-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 hostmakedepends="texinfo tar gcc-objc gcc-go flex perl python3 pkg-config"
 makedepends="isl-devel libmpc-devel gmp-devel mpfr-devel

From 6e184ec27e50462eb27dfc33cafaa57fcc80ea63 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:07 -0300
Subject: [PATCH 03/12] cross-powerpc-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-powerpc-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-powerpc-linux-gnu/template b/srcpkgs/cross-powerpc-linux-gnu/template
index 31578760f17e5..337d1c652ab91 100644
--- a/srcpkgs/cross-powerpc-linux-gnu/template
+++ b/srcpkgs/cross-powerpc-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-powerpc-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--enable-secureplt --disable-vtable-verify
  --enable-autolink-libatomic"

From 312a17234d6b25ab3c6e957d5d9967bd6d091cd3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:07 -0300
Subject: [PATCH 04/12] cross-powerpc64-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-powerpc64-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-powerpc64-linux-gnu/template b/srcpkgs/cross-powerpc64-linux-gnu/template
index cdbd1e26f725d..f4af015668699 100644
--- a/srcpkgs/cross-powerpc64-linux-gnu/template
+++ b/srcpkgs/cross-powerpc64-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-powerpc64-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--enable-secureplt --disable-vtable-verify --with-abi=elfv2
  --enable-targets=powerpc-linux --enable-autolink-libatomic"

From 20a01b31b3e4cd946519025d5328028e7764e3c8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:07 -0300
Subject: [PATCH 05/12] cross-powerpc64le-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-powerpc64le-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-powerpc64le-linux-gnu/template b/srcpkgs/cross-powerpc64le-linux-gnu/template
index 82dc196a247d5..6181e0e4d236d 100644
--- a/srcpkgs/cross-powerpc64le-linux-gnu/template
+++ b/srcpkgs/cross-powerpc64le-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-powerpc64le-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--enable-secureplt --disable-vtable-verify --with-abi=elfv2
  --enable-targets=powerpcle-linux --enable-autolink-libatomic"

From 00a980b0504593b4ac1b9ef26d74ef6b14325a3a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:08 -0300
Subject: [PATCH 06/12] cross-powerpcle-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-powerpcle-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-powerpcle-linux-gnu/template b/srcpkgs/cross-powerpcle-linux-gnu/template
index 7576278738d95..720cf97873314 100644
--- a/srcpkgs/cross-powerpcle-linux-gnu/template
+++ b/srcpkgs/cross-powerpcle-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-powerpcle-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--enable-secureplt --disable-vtable-verify
  --enable-autolink-libatomic"

From 3a848ffead17ce9d4233dce3ad6a0a150f7218ce Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:08 -0300
Subject: [PATCH 07/12] cross-riscv64-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-riscv64-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-riscv64-linux-gnu/template b/srcpkgs/cross-riscv64-linux-gnu/template
index 5cd6da69c32e8..bd1a010d98269 100644
--- a/srcpkgs/cross-riscv64-linux-gnu/template
+++ b/srcpkgs/cross-riscv64-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-riscv64-linux-gnu
 version=0.35
-revision=4
+revision=5
 build_style=void-cross
 configure_args="--with-arch=rv64gc --with-abi=lp64d --enable-autolink-libatomic --disable-multilib"
 hostmakedepends="texinfo tar gcc-objc gcc-go flex perl python3 pkg-config"

From b51e60727a311b27146de3abe41a02019e01b5ba Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 19:52:08 -0300
Subject: [PATCH 08/12] cross-x86_64-linux-gnu: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-x86_64-linux-gnu/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-x86_64-linux-gnu/template b/srcpkgs/cross-x86_64-linux-gnu/template
index 7eabe1625b23e..b7c2bb3f04fd7 100644
--- a/srcpkgs/cross-x86_64-linux-gnu/template
+++ b/srcpkgs/cross-x86_64-linux-gnu/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-x86_64-linux-gnu
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 hostmakedepends="texinfo tar gcc-objc gcc-go flex perl python3 pkg-config"
 makedepends="isl-devel libmpc-devel gmp-devel mpfr-devel

From df4818298f809e97afe7bfa30f0c930a959bbce9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 20:22:30 -0300
Subject: [PATCH 09/12] cross-arm-linux-gnueabi: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-arm-linux-gnueabi/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-arm-linux-gnueabi/template b/srcpkgs/cross-arm-linux-gnueabi/template
index 7cf232218eedb..a9b157cb8375b 100644
--- a/srcpkgs/cross-arm-linux-gnueabi/template
+++ b/srcpkgs/cross-arm-linux-gnueabi/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-arm-linux-gnueabi
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--with-arch=armv5te --with-float=soft
  --enable-autolink-libatomic"

From 916176228e11a9c434c27b4d26b53c3e483e008e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 20:22:35 -0300
Subject: [PATCH 10/12] cross-arm-linux-gnueabihf: rebuild to fix performance
 regression in posix_memalign

---
 srcpkgs/cross-arm-linux-gnueabihf/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-arm-linux-gnueabihf/template b/srcpkgs/cross-arm-linux-gnueabihf/template
index 6cd9d50f7f459..637675a151af2 100644
--- a/srcpkgs/cross-arm-linux-gnueabihf/template
+++ b/srcpkgs/cross-arm-linux-gnueabihf/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-arm-linux-gnueabihf
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--with-arch=armv6 --with-fpu=vfp --with-float=hard
  --enable-autolink-libatomic"

From 38bd181e11bfb5039c76b86919707d0614c17d4d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gonzalo=20Tornar=C3=ADa?= <tornaria@cmat.edu.uy>
Date: Tue, 26 Dec 2023 20:22:35 -0300
Subject: [PATCH 11/12] cross-armv7l-linux-gnueabihf: rebuild to fix
 performance regression in posix_memalign

Close: #47914
---
 srcpkgs/cross-armv7l-linux-gnueabihf/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/cross-armv7l-linux-gnueabihf/template b/srcpkgs/cross-armv7l-linux-gnueabihf/template
index e50363fcb4f75..18d502a6a41c8 100644
--- a/srcpkgs/cross-armv7l-linux-gnueabihf/template
+++ b/srcpkgs/cross-armv7l-linux-gnueabihf/template
@@ -5,7 +5,7 @@ _glibc_version=2.38
 _linux_version=5.10.4
 pkgname=cross-armv7l-linux-gnueabihf
 version=0.35
-revision=6
+revision=7
 build_style=void-cross
 configure_args="--with-arch=armv7-a --with-fpu=vfpv3 --with-float=hard"
 hostmakedepends="texinfo tar gcc-objc gcc-go flex perl python3 pkg-config"

From cef96d68bc63a8cfbf46698f8215b27aa6d03dc5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C4=90o=C3=A0n=20Tr=E1=BA=A7n=20C=C3=B4ng=20Danh?=
 <congdanhqx@gmail.com>
Date: Wed, 27 Dec 2023 08:57:56 +0700
Subject: [PATCH 12/12] busybox: for posix_memalign

---
 srcpkgs/busybox/template | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/srcpkgs/busybox/template b/srcpkgs/busybox/template
index 0b81f496c0769..e9932b73ad329 100644
--- a/srcpkgs/busybox/template
+++ b/srcpkgs/busybox/template
@@ -1,7 +1,7 @@
 # Template file for 'busybox'
 pkgname=busybox
 version=1.34.1
-revision=5
+revision=6
 hostmakedepends="perl"
 checkdepends="tar which zip"
 short_desc="Swiss Army Knife of Embedded Linux"

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PR PATCH] [Closed]: [ci skip] glibc: fix memalign performance regression
  2023-12-26 22:40 [PR PATCH] glibc: fix memalign performance regression tornaria
                   ` (3 preceding siblings ...)
  2023-12-27  1:58 ` [PR PATCH] [Updated] " sgn
@ 2023-12-27  1:58 ` sgn
  2023-12-27  8:04 ` [PR PATCH] [Merged]: " sgn
  5 siblings, 0 replies; 7+ messages in thread
From: sgn @ 2023-12-27  1:58 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 1795 bytes --]

There's a closed pull request on the void-packages repository

[ci skip] glibc: fix memalign performance regression
https://github.com/void-linux/void-packages/pull/47914

Description:
<!-- Uncomment relevant sections and delete options which are not applicable -->

The upgrade to 2.38 brought a very sad performance regression in sagemath:
```
$ time python -c 'from sage.graphs.generators.distance_regular import DoubleGrassmannGraph; print(DoubleGrassmannGraph(2,2))'
<string>:1: UserWarning: Resolving lazy import GF during startup
<string>:1: UserWarning: Resolving lazy import VectorSpace during startup
Double Grassmann graph (5, 2, 2)

real	0m30.101s
user	0m29.959s
sys	0m0.060s
```
while the same thing in 2.36 (or after this PR) takes ~ 1-2 seconds.

Thanks to @oreo639 for figuring out it was https://sourceware.org/bugzilla/show_bug.cgi?id=30723

Indeed, all the performance regressions I was seeing are gone now.

#### Testing the changes
- I tested the changes in this PR: **briefly**

<!--
#### New package
- This new package conforms to the [package requirements](https://github.com/void-linux/void-packages/blob/master/CONTRIBUTING.md#package-requirements): **YES**|**NO**
-->

<!-- Note: If the build is likely to take more than 2 hours, please add ci skip tag as described in
https://github.com/void-linux/void-packages/blob/master/CONTRIBUTING.md#continuous-integration
and test at least one native build and, if supported, at least one cross build.
Ignore this section if this PR is not skipping CI.
-->
<!--
#### Local build testing
- I built this PR locally for my native architecture, (ARCH-LIBC)
- I built this PR locally for these architectures (if supported. mark crossbuilds):
  - aarch64-musl
  - armv7l
  - armv6l-musl
-->


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PR PATCH] [Merged]: [ci skip] glibc: fix memalign performance regression
  2023-12-26 22:40 [PR PATCH] glibc: fix memalign performance regression tornaria
                   ` (4 preceding siblings ...)
  2023-12-27  1:58 ` [PR PATCH] [Closed]: " sgn
@ 2023-12-27  8:04 ` sgn
  5 siblings, 0 replies; 7+ messages in thread
From: sgn @ 2023-12-27  8:04 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 1795 bytes --]

There's a merged pull request on the void-packages repository

[ci skip] glibc: fix memalign performance regression
https://github.com/void-linux/void-packages/pull/47914

Description:
<!-- Uncomment relevant sections and delete options which are not applicable -->

The upgrade to 2.38 brought a very sad performance regression in sagemath:
```
$ time python -c 'from sage.graphs.generators.distance_regular import DoubleGrassmannGraph; print(DoubleGrassmannGraph(2,2))'
<string>:1: UserWarning: Resolving lazy import GF during startup
<string>:1: UserWarning: Resolving lazy import VectorSpace during startup
Double Grassmann graph (5, 2, 2)

real	0m30.101s
user	0m29.959s
sys	0m0.060s
```
while the same thing in 2.36 (or after this PR) takes ~ 1-2 seconds.

Thanks to @oreo639 for figuring out it was https://sourceware.org/bugzilla/show_bug.cgi?id=30723

Indeed, all the performance regressions I was seeing are gone now.

#### Testing the changes
- I tested the changes in this PR: **briefly**

<!--
#### New package
- This new package conforms to the [package requirements](https://github.com/void-linux/void-packages/blob/master/CONTRIBUTING.md#package-requirements): **YES**|**NO**
-->

<!-- Note: If the build is likely to take more than 2 hours, please add ci skip tag as described in
https://github.com/void-linux/void-packages/blob/master/CONTRIBUTING.md#continuous-integration
and test at least one native build and, if supported, at least one cross build.
Ignore this section if this PR is not skipping CI.
-->
<!--
#### Local build testing
- I built this PR locally for my native architecture, (ARCH-LIBC)
- I built this PR locally for these architectures (if supported. mark crossbuilds):
  - aarch64-musl
  - armv7l
  - armv6l-musl
-->


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-12-27  8:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-26 22:40 [PR PATCH] glibc: fix memalign performance regression tornaria
2023-12-26 22:52 ` [PR PATCH] [Updated] " tornaria
2023-12-26 23:23 ` [PR PATCH] [Updated] [ci skip] " tornaria
2023-12-27  0:17 ` oreo639
2023-12-27  1:58 ` [PR PATCH] [Updated] " sgn
2023-12-27  1:58 ` [PR PATCH] [Closed]: " sgn
2023-12-27  8:04 ` [PR PATCH] [Merged]: " sgn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).