discuss@mandoc.bsd.lv
 help / color / mirror / Atom feed
* zstd compression
@ 2025-01-28  7:10 Sören Tempel
  2025-01-28 10:40 ` Ingo Schwarze
  0 siblings, 1 reply; 5+ messages in thread
From: Sören Tempel @ 2025-01-28  7:10 UTC (permalink / raw)
  To: discuss

Hi,

Compression schemes beyond gzip are currently being adopted by Linux
distributions for man page compression. Case in point: Guix [1] has
switched to zstd compression recently, which has sadly rendered the
Guix mandoc package defunct [2].

For other downstream packagers impacted by this, I wanted to point out
that the zstd project provides a wrapper library which is API-compatible
with zlib and easily allows adapting software using zlib (e.g. mandoc)
to zstd compression. A sample patchset for doing that is available in
the Guix patch tracker [3].

Given mandoc's stance on man page compression [4], this is rather
intended as an FYI for downstream packagers, but in case there is
interested in proper upstream support for additional compression
algorithm, feel free to let me know.

Cheers!
Sören

[1]: https://guix.gnu.org/
[2]: https://issues.guix.gnu.org/68242
[3]: https://issues.guix.gnu.org/75501
[4]: https://inbox.vuxu.org/mandoc-discuss/20201129201424.GI58187@athene.usta.de
--
 To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: zstd compression
  2025-01-28  7:10 zstd compression Sören Tempel
@ 2025-01-28 10:40 ` Ingo Schwarze
  2025-01-29 23:36   ` Alexis
       [not found]   ` <87wmeep1wj.fsf@gmail.com>
  0 siblings, 2 replies; 5+ messages in thread
From: Ingo Schwarze @ 2025-01-28 10:40 UTC (permalink / raw)
  To: Soeren Tempel; +Cc: discuss

Hi Soeren,

Soeren Tempel wrote on Tue, Jan 28, 2025 at 08:10:53AM +0100:

> Compression schemes beyond gzip are currently being adopted by Linux
> distributions for man page compression. Case in point: Guix [1]

Yikes.  The goal at the very foundation of that software looks
misguided in the first place.  The purpose of privileged accounts
is to make sure that users cannot change the configuration of the
machine.  Installing software is a typical example where using a
priviledged account makes sense and is often even required because
it often involves creating daemon accounts or groups or directories
owned by specific users or groups or setting up jobs to run
periodically.  Also, it's good when system administrators have
an idea what people are running on a machine in order to keep it
properly secured, which also means that it's better when installation
is done by the admin.

Yes, i know users can always install their own software in their
home directory.  While trying to prevent that with technical means
is usually stupid because it doesn't work very well, people should
be discouraged from installing software they need themselves rather
than encouraged.

On a single-user machine, installing software with your normal
user account would be so stupid that it would leave me speechless.

So, what is the point?

Also, which system uses Guix?  I mean, a package manager normally
needs to be tightly integrated with the operating system, a stand-
alone package manager is an oxymoron (yes, i know the NetBSD thingy,
pksrc, wants to do precisely that, but that's mostly useful for
exotic systems coming with no packagge manager whatsoever, like AIX).

Are there any sane distros switching to zstd?

> has switched to zstd compression recently, which has sadly rendered the
> Guix mandoc package defunct [2].
> 
> For other downstream packagers impacted by this, I wanted to point out
> that the zstd project provides a wrapper library which is API-compatible
> with zlib and easily allows adapting software using zlib (e.g. mandoc)
> to zstd compression. A sample patchset for doing that is available in
> the Guix patch tracker [3].
> 
> Given mandoc's stance on man page compression [4], this is rather
> intended as an FYI for downstream packagers, but in case there is
> interested in proper upstream support for additional compression
> algorithm, feel free to let me know.

I feel serious tempted to rip out compression support altogether.
It's just so anachronistic.

By the way, your comment in the thread

  "Usually, its hard to convince them to add features that do not
   benefit OpenBSD."

is misleading.  I have implemented many features that do not benefit
OpenBSD in any way whatsoever, and some of these are even availabke
is the OpenBSD base system and not only in mandoc portable:

 * Style checks for manual pages specific to NetBSD.
 * The mdoc(7) .Lb macro, extensively used by FreeBSD and some others.
 * Very careful support for manual sections that are not pure
   numbers but have alphabetic suffixes, mostly for Illumos and
   Solaris, but also used by some Linux distributions.
 * ...

Some features are only in portable, not in OpenBSD:

 * Support for "make install" installing symlinks rather than
   hardlinks.
 * The READ_ALLOWED_PATH feature for Homwbrew and MacOSX
 * Support for installing libmandoc.a, mostly for NetBSD
 * A complete program, mandocd(8), mostly for Debian
 * and lots of other stuff, most of it visible in
   configure.local.example

But if somebody wants a feature and cannot explain what the benefit
is, i do not want such a feature to complicate the code.  Then again,
in this case, it might not complicate the code at all, in which
case i might be willing to accomodate even a foolish feature.

That said, let's see what cam be done for your case.
It appears you need four elements:

 * Five new *.o files.  I definitely do not want to add the
   source code for these five files to the mandoc repo.
   Can you instead install a system-wide shared object
   containing that code, such that mandoc can dynamically link,
   just like it is dynamically linking zlib right now?
 * The argument -lzstd on the ld(1) command line.
   That should be easy with an option in configure.local.example.
 * #include <zstd_zlibwrapper.h> in read.c.  Can easily be controlled
   with the same option.
 * Recognizing the .zst file extension.  Can easily be controlled
   with the same option.

By the way, is explicitly linking to both shared objects really
necessary?  Why does the libzstd_zlibwrapper.so (or however it is
called) not know that it depends on libz.so and let ld.so do
the work?  Not sure whether that's possible, if not, "-lz -lzstd"
is not out of the question, either.

Yours,
  Ingo
--
 To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: zstd compression
  2025-01-28 10:40 ` Ingo Schwarze
@ 2025-01-29 23:36   ` Alexis
       [not found]   ` <87wmeep1wj.fsf@gmail.com>
  1 sibling, 0 replies; 5+ messages in thread
From: Alexis @ 2025-01-29 23:36 UTC (permalink / raw)
  To: discuss

Ingo Schwarze <schwarze@usta.de> writes:

> Also, which system uses Guix?

GNU Guix System:

  https://en.wikipedia.org/wiki/GNU_Guix#Guix_System_(operating_system)


Alexis.
--
 To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: zstd compression
       [not found]   ` <87wmeep1wj.fsf@gmail.com>
@ 2025-01-30  9:59     ` Ingo Schwarze
       [not found]       ` <87v7tvoiqu.fsf@gmail.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Ingo Schwarze @ 2025-01-30  9:59 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: Soeren Tempel, discuss

Hello Maxim,

thanks for explaining.  I see, "Guix System" is also an operating system,
which is not immediately obvious from the web page (but only becomes
clear after following the "about" link).  I now also understand that
some steps have been taken to mitigate some of the security implications
of the fundamental concept.  It all sound extremely complicated, which
is usually not very good for security - then again, this is probably
the usual tradeoff that every system has to make: OpenBSD leans towards
security and simplicity, Guix appears to lean more towards fancy
features at the expense of significant complication.

Anyway, i have now added Guix here:

  http://mandoc.bsd.lv/ports.html  # three place: overview, packages,
                                     maintenance
  http://mandoc.bsd.lv/porthistory.html  # two entries


Maxim Cournoyer wrote on Wed, Jan 29, 2025 at 03:30:52PM +0900:
> Ingo Schwarze wrote:

>> I feel serious tempted to rip out compression support altogether.
>> It's just so anachronistic.

> It costs relatively little in complexity,

As long as only one compression format is required, the complexity is
moderate, and that contributed to not being ripped out yet.  It gets
worse when different systems require different compression.

Let's wait for Soeren - if he manages to get the compat lib installed
as a shared object, then i can do the zstd thing in the upstream repo
as explained earlier, with the (small) additional complexity confined
to the configuration system, without invading the code.

[...]
> That's a 50 MiB space saving, which while not enormous is not nothing
> either, especially on a transactional system like Guix where you may be
> keeping multiple past versions of things with the space usage adding up.

OK, let's assume the worst: every user installs all the packages
you have on your system, but in such a way that no two users
install the same version of any package.  Then it boils down to
50 MiB per user, which i would indeed call "nothing".  I mean,
a few lines above, you talked about making it easy fur users to
compile chromium.  Compared to that, 50 MiB is absolutely nothing
indeed.  I think you can safely assume every user does at least
*something* that requires space in amounts typical for the 2020ies,
and compared to that, 50 MiB is nothing.

Or look at it the other way: how many users do you have on your
machine?  Two hundred, maybe?  In that case, you gain 10 GB.
On which modern machine that serves hundreds of users do 10 GB
of disk space matter?

Yours,
  Ingo
--
 To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: zstd compression
       [not found]       ` <87v7tvoiqu.fsf@gmail.com>
@ 2025-01-31 12:43         ` Ingo Schwarze
  0 siblings, 0 replies; 5+ messages in thread
From: Ingo Schwarze @ 2025-01-31 12:43 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: Soeren Tempel, discuss

Hello Maxim,

Maxim Cournoyer wrote on Fri, Jan 31, 2025 at 10:49:13AM +0900:

> I researched the question shortly, and unless I'm missing something, the
> Zstd documentation currently suggests to add the few source files to the
> project

Right.  That's an extremely foolish recommendation.
Are they expecting me to perform a complete source code audit
of their files?  Not realistic, i would say.
Are they expecting me to include unaudited third-party code
into my repository?  Not realistic, either.

What if a vulnerability is found in one of these files?
Suddenly large numbers of completely unrelated projects
that included these files have to make urgent security releases?
That way lies madness, it's completely unsustainable.

> and weave them in the build [0]; the zlibWrapper Makefile also
> doesn't currently have a target to build a shared library.  I've created
> a feature request in the Zstd project to see if they'd consider adding a
> proper shared library for zlibWrapper [1].
> [0]  https://github.com/facebook/zstd/blob/dev/zlibWrapper/README.md
> [1]  https://github.com/facebook/zstd/issues/4277

Thank you for opening the ticket!  :-)

I have added a comment to improve the chances that they understand
why this matters.

> it feels like good hygiene to shave a bit of disk space usage
> where possible,

I think here we can happily agree to disagree.  I consider it good
hygiene to shave a bit of complexity where possible, and where
adding complexity would not provide any relevant benefit.

> especially if it's transparent to users (which it is).

It is not transparent to users.  While i use apropos(1) more frequently
than grep(1) on manual pages, i sometimes do run commands like

  # look for mdoc(7) pages with strangely formatted line macro
  grep -R '^\.  *[A-Z][a-z]' /usr/share/man
  # look for man(7) pages with strangely formatted line macro; finds a few
  grep -R '^\.  *[A-Z][A-Z]' /usr/share/man
  # look for a typical typo; needs fulltext search, cannot use semantic search
  grep -RF 'the the' /usr/share/man

You could argue that i could use zgrep(1) instead, but that's certainly
and pointlessly complicating the user experience.

Yours,
  Ingo
--
 To unsubscribe send an email to discuss+unsubscribe@mandoc.bsd.lv


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-01-31 12:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-28  7:10 zstd compression Sören Tempel
2025-01-28 10:40 ` Ingo Schwarze
2025-01-29 23:36   ` Alexis
     [not found]   ` <87wmeep1wj.fsf@gmail.com>
2025-01-30  9:59     ` Ingo Schwarze
     [not found]       ` <87v7tvoiqu.fsf@gmail.com>
2025-01-31 12:43         ` Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).