Github messages for voidlinux
 help / color / mirror / Atom feed
* [PR PATCH] RFC: Check for reproducible builds.
@ 2021-04-30  8:18 Gottox
  2021-04-30 12:53 ` [PR REVIEW] " ericonr
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Gottox @ 2021-04-30  8:18 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 1472 bytes --]

There is a new pull request by Gottox against master on the void-packages repository

https://github.com/Gottox/void-packages repro-check
https://github.com/void-linux/void-packages/pull/30588

RFC: Check for reproducible builds.
### Introduction

In void-packages the packages are anything but reproducible. Many other distributions, first and formost [NixOS](https://nixos.org/) and even [Debian](https://wiki.debian.org/ReproducibleBuilds) already did a lot of work to generate packages with stable checksums. Void's build system is able to do something similiar - with a few constraints - without much work.

### This is a starting point, not more.

As a first step to actually get an idea how bad the situation is I implemented a simple checker that compares the checksum of packages defined in templates to the actual result and spits out warnings they don't match.

This also introduces new variables to the templates:

`pkg_checksum_<arch>`, where <arch> is a sanitized version of the resulting architecture (`x86_64_musl` for `x86_64-musl` for example)

### Constraints:

* the packages are currently build with the githash backed in. This is an issue as the build is only stable within a certain commit.

### ToDo

* The documentation is currently not done, but will be added later.
* Find a way to make our package format reproducable across commits.

A patch file from https://github.com/void-linux/void-packages/pull/30588.patch is attached

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: github-pr-repro-check-30588.patch --]
[-- Type: text/x-diff, Size: 3481 bytes --]

From 997b6505d79698cb042bf549c1faa8e31c0a5158 Mon Sep 17 00:00:00 2001
From: Enno Boland <gottox@voidlinux.org>
Date: Fri, 30 Apr 2021 10:02:58 +0200
Subject: [PATCH] common/hooks: add hook to check for resulting package
 checksum missmatches

---
 common/hooks/post-pkg/01-check-reproduce.sh | 43 +++++++++++++++++++++
 etc/defaults.conf                           |  6 +++
 xbps-src                                    |  3 +-
 3 files changed, 51 insertions(+), 1 deletion(-)
 create mode 100644 common/hooks/post-pkg/01-check-reproduce.sh

diff --git a/common/hooks/post-pkg/01-check-reproduce.sh b/common/hooks/post-pkg/01-check-reproduce.sh
new file mode 100644
index 000000000000..f56abb1b11df
--- /dev/null
+++ b/common/hooks/post-pkg/01-check-reproduce.sh
@@ -0,0 +1,43 @@
+# This hook compares the checksum of the package with the saved value
+
+hook() {
+	local arch= binpkg= checksum_ptr= checksum_have= checksum_want=
+
+	if [ -z "$XBPS_CHECK_REPRODUCIBLE" ]; then
+		return 0;
+	fi
+
+	if [ -z "$XBPS_USE_BUILD_MTIME" ]; then
+		msg_warn "reproducability check will only report correct results when\n"
+		msg_warn "XBPS_USE_BUILD_MTIME is enabled.\n"
+	fi
+
+	if [ -z "$XBPS_CROSS_BUILD" -a -n "$XBPS_ARCH" -a "$XBPS_ARCH" != "$XBPS_TARGET_MACHINE" ]; then
+		arch=${XBPS_ARCH}
+	elif [ -n "$XBPS_TARGET_MACHINE" ]; then
+		arch=$XBPS_TARGET_MACHINE
+	else
+		arch=$XBPS_MACHINE
+	fi
+	binpkg=${pkgver}.${arch}.xbps
+
+	checksum_ptr="pkg_checksum_${arch//-/_}"
+	checksum_want=${!checksum_ptr}
+
+	checksum_have=$(sha256sum "$binpkg" | awk '{ print $1 }')
+
+	if [ -z "${checksum_want}" ]; then
+		msg_normal "$pkgver: template does not define a pkg_checksum\n"
+		msg_normal "$pkgver: if the build is reproducable define the package checksum in the template:\n"
+		msg_normal "$pkgver: $checksum_ptr="$checksum_want"\n"
+		return 0
+	fi
+
+	if [ "${checksum_have}" != "${checksum_want}" ]; then
+		msg_warn "${pkgver}: Checksum mismatch. reproducable build seems to be broken.\n"
+		msg_warn "${pkgver}: Gather relevant system info:\n"
+		msg_normal "CPU: $(grep "^model name" /proc/cpuinfo | head -n 1 | sed 's/.*: //')"
+	else
+		msg_normal "${pkgver}: Checksums patch; build seems to be reproducable.\n"
+	fi
+}
diff --git a/etc/defaults.conf b/etc/defaults.conf
index 6147954a18af..55b9568c812a 100644
--- a/etc/defaults.conf
+++ b/etc/defaults.conf
@@ -130,6 +130,12 @@ XBPS_SUCMD="sudo /bin/sh -c"
 #XBPS_CHROOT_CMD=uchroot
 #XBPS_CHROOT_CMD_ARGS=""
 
+# [OPTIONAL]
+# If enabled, xbps-src will check the resulting checksum of a package against
+# a defined one. This helps to detect packages that have non-deterministic builds
+#
+#XBPS_CHECK_REPRODUCIBLE=yes
+
 # [OPTIONAL]
 # Enable to use the standard mtime of files. Otherwise it will be rewritten to
 # the HEAD commit time. Requires git when disabled.
diff --git a/xbps-src b/xbps-src
index c3cd7e5db10b..7fdb2dd41b57 100755
--- a/xbps-src
+++ b/xbps-src
@@ -635,7 +635,8 @@ export XBPS_SHUTILSDIR XBPS_CROSSPFDIR XBPS_TRIGGERSDIR \
     XBPS_DESTDIR XBPS_MACHINE XBPS_TEMP_MASTERDIR XBPS_BINPKG_EXISTS \
     XBPS_LIBEXECDIR XBPS_DISTDIR XBPS_DISTFILES_MIRROR XBPS_ALLOW_RESTRICTED \
     XBPS_USE_GIT_COMMIT_DATE XBPS_PKG_COMPTYPE XBPS_REPO_COMPTYPE \
-    XBPS_BUILDHELPERDIR XBPS_USE_BUILD_MTIME XBPS_BUILD_ENVIRONMENT
+    XBPS_BUILDHELPERDIR XBPS_CHECK_REPRODUCIBLE XBPS_USE_BUILD_MTIME \
+    XBPS_BUILD_ENVIRONMENT
 
 for i in REPOSITORY DESTDIR BUILDDIR SRCDISTDIR; do
     eval val="\$XBPS_$i"

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PR REVIEW] RFC: Check for reproducible builds.
  2021-04-30  8:18 [PR PATCH] RFC: Check for reproducible builds Gottox
@ 2021-04-30 12:53 ` ericonr
  2021-04-30 13:01 ` ericonr
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ericonr @ 2021-04-30 12:53 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 182 bytes --]

New review comment by ericonr on void-packages repository

https://github.com/void-linux/void-packages/pull/30588#discussion_r623848137

Comment:
It's easier to use `xbps-digest` :p

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Check for reproducible builds.
  2021-04-30  8:18 [PR PATCH] RFC: Check for reproducible builds Gottox
  2021-04-30 12:53 ` [PR REVIEW] " ericonr
@ 2021-04-30 13:01 ` ericonr
  2021-04-30 15:55 ` ericonr
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ericonr @ 2021-04-30 13:01 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 777 bytes --]

New comment by ericonr on void-packages repository

https://github.com/void-linux/void-packages/pull/30588#issuecomment-830079060

Comment:
To explain my comment above, tracking `source-revisions` is equivalent to `.BUILDINFO` files used by other distros (which can be necessary because they don't use monorepos and so the build environment is much more variable), and so I'm strongly against its removal.

IIRC there was some discussion about whether a `.BUILDINFO` file should be included in the package itself or be a "side cart" file, but that adds complexity, more files to sign and track, and removes what is currently built-in information that can be useful for someone to try and reproduce things locally (without having to download the side cart with a weird path).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Check for reproducible builds.
  2021-04-30  8:18 [PR PATCH] RFC: Check for reproducible builds Gottox
  2021-04-30 12:53 ` [PR REVIEW] " ericonr
  2021-04-30 13:01 ` ericonr
@ 2021-04-30 15:55 ` ericonr
  2021-04-30 16:59 ` Chocimier
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ericonr @ 2021-04-30 15:55 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 308 bytes --]

New comment by ericonr on void-packages repository

https://github.com/void-linux/void-packages/pull/30588#issuecomment-830192259

Comment:
Furthermore, this would only make sense for the builders at an exact moment in time. Someone bootstrapping from zero would have different checksums at a lot of stages.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Check for reproducible builds.
  2021-04-30  8:18 [PR PATCH] RFC: Check for reproducible builds Gottox
                   ` (2 preceding siblings ...)
  2021-04-30 15:55 ` ericonr
@ 2021-04-30 16:59 ` Chocimier
  2021-04-30 17:18 ` ericonr
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Chocimier @ 2021-04-30 16:59 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 1266 bytes --]

New comment by Chocimier on void-packages repository

https://github.com/void-linux/void-packages/pull/30588#issuecomment-830230405

Comment:
1. I enjoy seeing effort to increase build reproducibility.
2. Ericonr is right that presence of `source-revisions` increases reproducibility, as it provides information on what dependencies, build tools and packaging tools, including xbps-src build-style and hooks, all affecting resulting package, was used. However guarantee of mismatching checksum of packages built from different commits, even if diff is unrelated to package, may still pose a problem.
3. Some packages will still come out identical even without reproducing every dependency recursively since foundation of distro. As I understand, this PR aims to assess is it closer 5 or 75 percent of packages.
4. How do you plan to use this hook? Is it filling variable locally, then observing hook messages on builder? Is it to store historical checksums in source repository? I am asking, because we already collect checksum of packages in binary repo, as sig files. If your workflow could be reversed to verify official signatures against packages reproduced outside of builders, then all packages are already checksummed, and this hook may be not necessary.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Check for reproducible builds.
  2021-04-30  8:18 [PR PATCH] RFC: Check for reproducible builds Gottox
                   ` (3 preceding siblings ...)
  2021-04-30 16:59 ` Chocimier
@ 2021-04-30 17:18 ` ericonr
  2021-04-30 17:45 ` ericonr
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ericonr @ 2021-04-30 17:18 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 1580 bytes --]

New comment by ericonr on void-packages repository

https://github.com/void-linux/void-packages/pull/30588#issuecomment-830240975

Comment:
> Some packages will still come out identical even without reproducing every dependency recursively since foundation of distro.

For the record, `libzstd` doesn't guarantee identical results when using different versions. The binary format is always compatible, but different choices can be made.

> I am asking, because we already collect checksum of packages in binary repo, as sig files. If your workflow could be reversed to verify official signatures against packages reproduced outside of builders, then all packages are already checksummed, and this hook may be not necessary.

I think having it external to void-packages would be better... Having it be a field in templates means bigger diffs for updates, manual work to build the package for all archs we want to check (plus the possibility of forgetting to update one field or another), and other things. If we could have some way of pushing our local checksums from testing updates to a separate repo, and then have daily CI for that repository that checks if hashes from locally built packages match what's listed in repodata, that'd be almost as useful...

Anyway, we still would need to find a way to deal with metadata... Maybe unpack the package files and run a content checksum, ignoring the metadata? It wouldn't help catch issues in metadata generation (which there are, sometimes the list of libraries goes in a different order iirc), but it would be something.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Check for reproducible builds.
  2021-04-30  8:18 [PR PATCH] RFC: Check for reproducible builds Gottox
                   ` (4 preceding siblings ...)
  2021-04-30 17:18 ` ericonr
@ 2021-04-30 17:45 ` ericonr
  2021-04-30 23:56 ` Gottox
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ericonr @ 2021-04-30 17:45 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 1417 bytes --]

New comment by ericonr on void-packages repository

https://github.com/void-linux/void-packages/pull/30588#issuecomment-830256754

Comment:
So, thinking about this.

- the current version of this PR won't almost ever have matched checksums, because of `source-revisions`; we would have to work around that if we want to go with this path
- very few people build packages locally with debug symbols, and `compile with debug + strip` is different from `compile without debug`, so anyone updating templates will need to consider it
- void doesn't have a package archive, nor do we have `BUILDINFO`. While `source-revisions` provides the git repo state (hooks and build styles), and, therefore, the version of all the packages listed in the template in `(host)makedepends`, it doesn't record the version of the dependencies of the packages listed in `(host)makedepends`. Specifying a `BUILDINFO` file to record that information would be necessary. The archive would be necessary to have somewhere to fetch the specific versions of packages from

The current vision of `reproducible-builds` usually means trying to have the same build environment to see if the package is deterministic. If we want to study what packages are "reproducible" when built with different dependencies and etc (remembering that even line numbers in headers can change debug info), I think we would have to call this effort something else.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Check for reproducible builds.
  2021-04-30  8:18 [PR PATCH] RFC: Check for reproducible builds Gottox
                   ` (5 preceding siblings ...)
  2021-04-30 17:45 ` ericonr
@ 2021-04-30 23:56 ` Gottox
  2021-05-01  0:00 ` Gottox
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Gottox @ 2021-04-30 23:56 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 333 bytes --]

New comment by Gottox on void-packages repository

https://github.com/void-linux/void-packages/pull/30588#issuecomment-830465113

Comment:
> I'm against this, because our build system isn't deterministic.

We shouldn't improve deterministic builds because we don't have deterministic builds. I don't see how that's a valid reason.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Check for reproducible builds.
  2021-04-30  8:18 [PR PATCH] RFC: Check for reproducible builds Gottox
                   ` (6 preceding siblings ...)
  2021-04-30 23:56 ` Gottox
@ 2021-05-01  0:00 ` Gottox
  2021-05-01  0:57 ` ericonr
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Gottox @ 2021-05-01  0:00 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 275 bytes --]

New comment by Gottox on void-packages repository

https://github.com/void-linux/void-packages/pull/30588#issuecomment-830466174

Comment:
There are problems with changing metadata, but there are ways around this. We have all the moving parts available and can work on them.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Check for reproducible builds.
  2021-04-30  8:18 [PR PATCH] RFC: Check for reproducible builds Gottox
                   ` (7 preceding siblings ...)
  2021-05-01  0:00 ` Gottox
@ 2021-05-01  0:57 ` ericonr
  2021-05-01  1:01 ` ericonr
  2021-05-21 13:49 ` [PR PATCH] [Closed]: " Gottox
  10 siblings, 0 replies; 12+ messages in thread
From: ericonr @ 2021-05-01  0:57 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 2888 bytes --]

New comment by ericonr on void-packages repository

https://github.com/void-linux/void-packages/pull/30588#issuecomment-830478097

Comment:
> We shouldn't improve deterministic builds because we don't have deterministic builds. I don't see how that's a valid reason.

The point is that the checksum won't match in 100% of the cases because of the metadata (specifically `source-revisions` - when you update the template with the new checksum for the package, that will change the commit hash and therefore the checksum; it's a loop). Most (all?) reproducible builds projects start from a binary artifact produced by the build and check if they can re-generate that based on information they had about the build environment, instead of checking if the builder produced the expected binary. I believe guix tries to check checksums from the bootstrap build process, but they always start from 0, which is a very different situation than what we have.

The determinism I was talking about wasn't in the sense of reproducible-builds, but in the sense that we can't control the value of `source-revisions` until the package has been built.

> There are problems with changing metadata, but there are ways around this. We have all the moving parts available and can work on them.

Which is why I'm proposing to study them. A content checksum (like our `@hash` checksum for templates) that skips `props.plist` would do most of the work (though it'd be slower). The issue then would be how to enforce policy to make people build packages for all archs, with the same settings used by builders (and then you have to take into account that `SOURCE_DATE_EPOCH` is derived from when a template was last touched, and that's also a loop of information, because a commit with a different date will once again alter the date, and this value *can* appear in normal files, for things like "build time"; this should be a rare case, though).

I don't think we have resources for a package archive or a "rebuilder" server that just tries to rebuild packages on its own, but I don't think the added workload and noise ("is this checksum wrong because someone forgot to udpate it or because something changed?") makes it worth having reproducible builds tracked in this repository. I would gladly contribute to a separate repository that tracks "verifications" from void users, and we could figure out some way to calculate "hashes" that skip the difference caused by building from different hashes (which would, unfortunately, partly defeat the purpose of reproducible-builds: to be able to verify that packages are exactly the same by simply comparing hashes from raw archives, without complicated tools).

To be clear, I am very interested in this work, and would like to see Void more reproducible (and have its reproducibility quantified as well). But I don't think this is a correct direction.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: Check for reproducible builds.
  2021-04-30  8:18 [PR PATCH] RFC: Check for reproducible builds Gottox
                   ` (8 preceding siblings ...)
  2021-05-01  0:57 ` ericonr
@ 2021-05-01  1:01 ` ericonr
  2021-05-21 13:49 ` [PR PATCH] [Closed]: " Gottox
  10 siblings, 0 replies; 12+ messages in thread
From: ericonr @ 2021-05-01  1:01 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 2889 bytes --]

New comment by ericonr on void-packages repository

https://github.com/void-linux/void-packages/pull/30588#issuecomment-830478097

Comment:
> We shouldn't improve deterministic builds because we don't have deterministic builds. I don't see how that's a valid reason.

The point is that the checksum won't match in 100% of the cases because of the metadata (specifically `source-revisions` - when you update the template with the new checksum for the package, that will change the commit hash and therefore the checksum; it's a loop). Most (all?) reproducible builds projects start from a binary artifact produced by the build and check if they can re-generate that based on information they had about the build environment, instead of checking if the builder produced the expected binary. I believe guix tries to check checksums from the bootstrap build process, but they always start from 0, which is a very different situation than what we have.

The determinism I was talking about wasn't in the sense of reproducible-builds, but in the sense that we can't control the value of `source-revisions` until the package has been built.

> There are problems with changing metadata, but there are ways around this. We have all the moving parts available and can work on them.

Which is why I'm proposing to study them. A content checksum (like our `@hash` checksum for templates) that skips `props.plist` would do most of the work (though it'd be slower). The issue then would be how to enforce policy to make people build packages for all archs, with the same settings used by builders (and then you have to take into account that `SOURCE_DATE_EPOCH` is derived from when a template was last touched, and that's also a loop of information, because a commit with a different date will once again alter the date, and this value *can* appear in normal files, for things like "build time"; this should be a rare case, though).

I don't think we have resources for a package archive or a "rebuilder" server that just tries to rebuild packages on its own, but I don't think the added workload and noise ("is this checksum wrong because someone forgot to udpate it or because something changed?") makes it worth having reproducible builds tracked in this repository. I would gladly contribute to a separate repository that tracks "verifications" from void users, and we could figure out some way to calculate "hashes" that skip the difference caused by building from different commits (which would, unfortunately, partly defeat the purpose of reproducible-builds: to be able to verify that packages are exactly the same by simply comparing hashes from raw archives, without complicated tools).

To be clear, I am very interested in this work, and would like to see Void more reproducible (and have its reproducibility quantified as well). But I don't think this is a correct direction.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PR PATCH] [Closed]: RFC: Check for reproducible builds.
  2021-04-30  8:18 [PR PATCH] RFC: Check for reproducible builds Gottox
                   ` (9 preceding siblings ...)
  2021-05-01  1:01 ` ericonr
@ 2021-05-21 13:49 ` Gottox
  10 siblings, 0 replies; 12+ messages in thread
From: Gottox @ 2021-05-21 13:49 UTC (permalink / raw)
  To: ml

[-- Attachment #1: Type: text/plain, Size: 1319 bytes --]

There's a closed pull request on the void-packages repository

RFC: Check for reproducible builds.
https://github.com/void-linux/void-packages/pull/30588

Description:
### Introduction

In void-packages the packages are anything but reproducible. Many other distributions, first and formost [NixOS](https://nixos.org/) and even [Debian](https://wiki.debian.org/ReproducibleBuilds) already did a lot of work to generate packages with stable checksums. Void's build system is able to do something similiar - with a few constraints - without much work.

### This is a starting point, not more.

As a first step to actually get an idea how bad the situation is I implemented a simple checker that compares the checksum of packages defined in templates to the actual result and spits out warnings they don't match.

This also introduces new variables to the templates:

`pkg_checksum_<arch>`, where <arch> is a sanitized version of the resulting architecture (`x86_64_musl` for `x86_64-musl` for example)

### Constraints:

* the packages are currently build with the githash backed in. This is an issue as the build is only stable within a certain commit.

### ToDo

* The documentation is currently not done, but will be added later.
* Find a way to make our package format reproducable across commits.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-05-21 13:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-30  8:18 [PR PATCH] RFC: Check for reproducible builds Gottox
2021-04-30 12:53 ` [PR REVIEW] " ericonr
2021-04-30 13:01 ` ericonr
2021-04-30 15:55 ` ericonr
2021-04-30 16:59 ` Chocimier
2021-04-30 17:18 ` ericonr
2021-04-30 17:45 ` ericonr
2021-04-30 23:56 ` Gottox
2021-05-01  0:00 ` Gottox
2021-05-01  0:57 ` ericonr
2021-05-01  1:01 ` ericonr
2021-05-21 13:49 ` [PR PATCH] [Closed]: " Gottox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).