9front - general discussion about 9front
 help / color / mirror / Atom feed
* page epub support
@ 2020-11-16  8:31 umbraticus
  2020-11-16 17:27 ` [9front] " ori
  2020-11-17  2:42 ` sl
  0 siblings, 2 replies; 8+ messages in thread
From: umbraticus @ 2020-11-16  8:31 UTC (permalink / raw)
  To: 9front

diff -r c6e94385ea0f sys/src/cmd/page.c
--- a/sys/src/cmd/page.c	Sun Nov 15 22:47:45 2020 +0100
+++ b/sys/src/cmd/page.c	Mon Nov 16 20:55:22 2020 +1300
@@ -439,1 +439,1 @@
-		"print item[ref]; fflush}}'");
+		"print item[ref]}}'");

This patch makes this work for me:

; hget https://www.gutenberg.org/ebooks/5200.epub.noimages | page

With the flush, I only get the first (blank) page.  I think I
understand why the flush is there (to get each item as a separate read
in the while(read){addpage} that follows), so I don't really
understand why removing it makes things work...

Do other epubs work for other people? I haven't found one yet,

umbraticus


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9front] page epub support
  2020-11-16  8:31 page epub support umbraticus
@ 2020-11-16 17:27 ` ori
  2020-11-17 12:21   ` qwx
  2020-11-17  2:42 ` sl
  1 sibling, 1 reply; 8+ messages in thread
From: ori @ 2020-11-16 17:27 UTC (permalink / raw)
  To: umbraticus, 9front

Quoth umbraticus@prosimetrum.com:
> With the flush, I only get the first (blank) page.  I think I
> understand why the flush is there (to get each item as a separate read
> in the while(read){addpage} that follows), so I don't really
> understand why removing it makes things work...

This fix looks wrong.

It seems like we have a bug in our awk input buffering with
fflush:

	cpu% echo `''{seq 10}  | awk '{print $0; fflush}'
	1
	2
	3
	4
	5
	6
	7
	8
	9
	10
	
	cpu% echo `''{seq 10}  | awk '{print $0; fflush}'
	1

The fix should be in awk.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9front] page epub support
  2020-11-16  8:31 page epub support umbraticus
  2020-11-16 17:27 ` [9front] " ori
@ 2020-11-17  2:42 ` sl
  1 sibling, 0 replies; 8+ messages in thread
From: sl @ 2020-11-17  2:42 UTC (permalink / raw)
  To: 9front

> Do other epubs work for other people? I haven't found one yet

confirmed epub files that used to work for me now do similar things
as your example.

my example:

	hget http://1oct1993.com/epub/1OCT1993.epub | page

does not display the whole book.

sl


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9front] page epub support
  2020-11-16 17:27 ` [9front] " ori
@ 2020-11-17 12:21   ` qwx
  2020-11-20  7:11     ` Anthony Martin
  0 siblings, 1 reply; 8+ messages in thread
From: qwx @ 2020-11-17 12:21 UTC (permalink / raw)
  To: 9front

> Quoth umbraticus@prosimetrum.com:
> > With the flush, I only get the first (blank) page.  I think I
> > understand why the flush is there (to get each item as a separate read
> > in the while(read){addpage} that follows), so I don't really
> > understand why removing it makes things work...
> 
> This fix looks wrong.
> 
> It seems like we have a bug in our awk input buffering with
> fflush:
[...]

This is an old bug introduced way back when we got native awk
thanks to spew.  It got discussed on irc, raised some technical
difficulties, then fell off the radar...  I kludged page just
as umbraticus has back then because I had to work on something
else, and completely forgot to ask about it.  I can see one
discussion about it on irc from 2016-08-18, but not much to go
on.  Sorry m(

qwx


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9front] page epub support
  2020-11-17 12:21   ` qwx
@ 2020-11-20  7:11     ` Anthony Martin
  2020-11-20  8:28       ` umbraticus
  2020-11-20 15:43       ` ori
  0 siblings, 2 replies; 8+ messages in thread
From: Anthony Martin @ 2020-11-20  7:11 UTC (permalink / raw)
  To: 9front

# HG changeset patch
# User Anthony Martin <ality@pbrane.org>
# Date 1605855926 28800
#      Thu Nov 19 23:05:26 2020 -0800
# Node ID 3d602d959da7944e6844100b536f5442e0970370
# Parent  27e5e30d3c30b64a9fb6455d81f10083a526972f
awk: fix truncated input after fflush

Before the "native" awk work, a call to the fflush function resulted
in one or more calls to the APE fflush(2).

Calling fflush on a stream open for reading has different behavior
based on the environment: within APE, it's a no-op¹; on OpenBSD, it's
an error²; in musl, it depends on whether or not the underlying file
descriptor is seekable³; etc. I'm sure glibc is subtly different.

Now that awk uses libbio, things are different: calling Bflush(2) on a
file open for reading simply discards any data in the buffer. This
explains why we're seeing truncated input. When awk attempts to read
in the next record, there's nothing in the buffer and no more data to
read so it gets EOF and exits normally. Note that this behavior is not
documented in bio(2). It was added in the second edition but I haven't
figured out why or what depends on it.

The simple fix is to have awk only call Bflush on files that were
opened for writing. You could argue that this is the only correct
behavior according to the awk(1) manual and it is, in fact, how GNU
awk behaves⁴.

1. /sys/src/ape/lib/ap/stdio/fflush.c
2. https://cvsweb.openbsd.org/src/lib/libc/stdio/fflush.c?rev=1.9
3. https://git.musl-libc.org/cgit/musl/tree/src/stdio/fflush.c
4. https://git.savannah.gnu.org/cgit/gawk.git/tree/io.c#n1492

diff --git a/sys/src/cmd/awk/run.c b/sys/src/cmd/awk/run.c
--- a/sys/src/cmd/awk/run.c
+++ b/sys/src/cmd/awk/run.c
@@ -1707,6 +1707,8 @@
 	files[2].fp = &stderr;
 }

+#define writing(m) ((m) != LT && (m) != LE)
+
 Biobuf *openfile(int a, char *us)
 {
 	char *s = us;
@@ -1719,8 +1721,11 @@
 		if (files[i].fname && strcmp(s, files[i].fname) == 0) {
 			if (a == files[i].mode || (a==APPEND && files[i].mode==GT))
 				return files[i].fp;
-			if (a == FFLUSH)
+			if (a == FFLUSH) {
+				if(!writing(files[i].mode))
+					return nil;
 				return files[i].fp;
+			}
 		}
 	if (a == FFLUSH)	/* didn't find it, so don't create it! */
 		return nil;
@@ -1815,7 +1820,7 @@
 	int i;

 	for (i = 0; i < FOPEN_MAX; i++)
-		if (files[i].fp)
+		if (files[i].fp && writing(files[i].mode))
 			Bflush(files[i].fp);
 }



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9front] page epub support
  2020-11-20  7:11     ` Anthony Martin
@ 2020-11-20  8:28       ` umbraticus
  2020-11-20 23:29         ` qwx
  2020-11-20 15:43       ` ori
  1 sibling, 1 reply; 8+ messages in thread
From: umbraticus @ 2020-11-20  8:28 UTC (permalink / raw)
  To: 9front

> awk: fix truncated input after fflush

Great, thanks!

Does somebody want to apply this?

umbraticus


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9front] page epub support
  2020-11-20  7:11     ` Anthony Martin
  2020-11-20  8:28       ` umbraticus
@ 2020-11-20 15:43       ` ori
  1 sibling, 0 replies; 8+ messages in thread
From: ori @ 2020-11-20 15:43 UTC (permalink / raw)
  To: ality, 9front

Quoth Anthony Martin <ality@pbrane.org>:
> # HG changeset patch
> # User Anthony Martin <ality@pbrane.org>
> # Date 1605855926 28800
> #      Thu Nov 19 23:05:26 2020 -0800
> # Node ID 3d602d959da7944e6844100b536f5442e0970370
> # Parent  27e5e30d3c30b64a9fb6455d81f10083a526972f
> awk: fix truncated input after fflush
> 

Looks good to me, commmitted.

Thanks for the research and detailed writeup!


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9front] page epub support
  2020-11-20  8:28       ` umbraticus
@ 2020-11-20 23:29         ` qwx
  0 siblings, 0 replies; 8+ messages in thread
From: qwx @ 2020-11-20 23:29 UTC (permalink / raw)
  To: 9front

> awk: fix truncated input after fflush

Awesome job, thanks!

qwx


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-11-20 23:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-16  8:31 page epub support umbraticus
2020-11-16 17:27 ` [9front] " ori
2020-11-17 12:21   ` qwx
2020-11-20  7:11     ` Anthony Martin
2020-11-20  8:28       ` umbraticus
2020-11-20 23:29         ` qwx
2020-11-20 15:43       ` ori
2020-11-17  2:42 ` sl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).