From: Benjamin Buchmuller via ntg-context <ntg-context@ntg.nl>
To: Max Chernoff <mseven@telus.net>
Cc: Benjamin Buchmuller <benjamin.buchmuller@gmail.com>,
mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: Count (and limit) glyphs per line?
Date: Sun, 26 Jun 2022 11:59:01 -0400 [thread overview]
Message-ID: <FC9CC7C3-631B-4C57-8DD4-556EBEA93ED4@gmail.com> (raw)
In-Reply-To: <fc5cc061-eb8b-8288-2859-469a2e384dac@telus.net>
Hi Max,
Thank you so much for your help and pointing me to the documents; always a lot of things to learn in TeX!
I'm afraid that including the hyphen width doesn't solve the issue yet. It seems to move the problem to other parts of the text. My guess is that one could equivalently have said "local max_length = 111", right?
I made the following MWE (reproducible also online) to illustrate what I see:
* Here, instead of a breaking point, the trouble is caused by not being able to break it. This causes the next line to be underfull. (I get a lot of these, but also some with hyphenated breakpoints, in my own document. Maybe the insertion point of the penalty/breaking bonus needs to move up?)
* Running with hsize only makes the problem worse in itemizations, so I think localhsize is the way to go. My guess, localhsize is the width of the "text" part of a paragraph, for example, excluding the symbols in the itemization.
(More thoughts below)
\startluacode
local max_length = 112
local glyph_id = node.id "glyph"
local disc_id = node.id "disc"
local glue_id = node.id "glue"
function userdata.limiter(head)
head = language.hyphenate(head)
local hyphen = node.new "glyph"
hyphen.char = language.prehyphenchar(0)
hyphen.font = font.current()
local width = hyphen.width
node.free(hyphen)
local chars = 0
local n = head
while n do
if n.id == glyph_id or n.id == glue_id then
chars = chars + 1
width = width + n.width - (n.shrink or 0)
end
local localhsize = tex.dimen["textwidth"]
if tex.dimen["localhsize"] > 0 then
localhsize = tex.dimen["localhsize"]
end
if chars >= max_length or width > localhsize then
local back_chars = 0
local end_disc = nil
while n do
if n.id == glue_id then
local penalty = node.new "penalty"
penalty.penalty = -10000
node.insertbefore(head, n, penalty)
break
end
if not end_disc and n.id == disc_id then
end_disc = n
end
if end_disc and back_chars >= 5 then
end_disc.penalty = -10000
break
end
if n.id == glyph_id then
back_chars = back_chars + 1
end
n = n.prev
end
width = 0
chars = 0
end
n = n.next
end
return head
end
nodes.tasks.appendaction(
"processors",
"before",
"userdata.limiter"
)
\stopluacode
\setuppapersize[A5]
\showframe
\starttext
This is text width:
\ctxlua{context(tex.dimen["textwidth"])}
This is hsize:
\ctxlua{context(tex.dimen["hsize"])}
This is localhsize:
\ctxlua{context(tex.dimen["localhsize"])}
\startitemize[width=5em]
\item Thus, I came to the conclusion that the \hbox{designer} of a new system must not only be the implementer and first large--scale user; the de signer should also write the first user manual.
\item \samplefile{knuth}
This is text width:
\ctxlua{context(tex.dimen["textwidth"])}
This is hsize:
\ctxlua{context(tex.dimen["hsize"])}
This is localhsize:
\ctxlua{context(tex.dimen["localhsize"])}
\stopitemize
\stoptext
I'm wondering if I do understand the second while loop correctly:
* Once we find the node that exceeds either the character limit or the (local-)hsize (glyphs and glues summed, for disc we add hyphen.width, do we?), then we insert an incredibly good breaking point for a new line. And exit the loop.
* The other cases still seem a bit obscure to me, and I tried to trace where each of them might be triggered:
if n.id == glue_id then
local penalty = node.new "penalty"
penalty.penalty = -10000
node.insertbefore(head, n, penalty)
context.inrightmargin("glue")
break
end
if not end_disc and n.id == disc_id then
context.inrightmargin("disc")
end_disc = n
end
if end_disc and back_chars >= 5 then
context.inrightmargin("end")
end_disc.penalty = -10000
break
end
if n.id == glyph_id then
context.inrightmargin("glyph")
back_chars = back_chars + 1
end
I'm maybe doing this wrong, but I see these conditions triggered more often than probably expected for a 25 line document?
local count_me = 0
...
if chars >= max_length or width > localhsize then
local back_chars = 0
local end_disc = nil
while n do
local check = "glyph"
count_me = count_me + 1
if n.id == glue_id then
local penalty = node.new "penalty"
penalty.penalty = -10000
node.insertbefore(head, n, penalty)
context.inrightmargin("\\color[red]{" .. string.rep("_", count_me) .. count_me .. "}")
break
end
if not end_disc and n.id == disc_id then
end_disc = n
end
--
if end_disc and back_chars >= 5 then
context.inrightmargin("\\color[blue]{" .. string.rep("_", count_me) .. count_me .. "}")
end_disc.penalty = -10000
break
end
if n.id == glyph_id then
context.inrightmargin("\\color[black]{" .. string.rep("_", count_me) .. count_me .. "}")
back_chars = back_chars + 1
end
n = n.prev
end
Many thanks again!
Benjamin
> On Jun 25, 2022, at 17:40, Max Chernoff <mseven@telus.net> wrote:
>
>> It's also a very insightful example of how to use and inject Lua code in the TeX output routine.
>
> This is injecting Lua code before the paragraph builder, not in the output routine. Something like https://tex.stackexchange.com/a/644613/270600 or my module "lua-widow-control" would be an example of Lua code in the output routine.
>
>> Do you mind if I add it to the wiki? (Probably under "Wrapping".)
>
> Sure
>
>> However, tex.localhsize (or tex.dimen["localhsize"]) is 0 when the document is initialized. (Maybe a more sensible default would be textwidth rather than 0?)
>> So, I added:
>> local localhsize = tex.dimen["textwidth"]
>>
>> if tex.dimen["localhsize"] > 0 then
>> localhsize = tex.dimen["localhsize"]
>> end
>> if chars >= max_length or width > localhsize then
>
> I don't think that's necessary. \hsize is a primitive TeX parameter that sets the width of the paragraph. It may be zero at the start of the document, but it is definitely non-zero by the end of every paragraph.
>
> The Lua function gets the current value of \hsize at the end of every paragraph, so it should be using the exact same value that TeX's paragraph builder uses, meaning that it should account for itemizations and such. I'm not really sure what \localhsize is, but it's probably similar to \hsize.
>> (2) I'm (now?) running into trouble with hyphenation.
>> In my own document, I also get lines with only a single character or hboxed group. I assume, this is because the hyphen is not counted and pushes the remainder to a new line where the intended breakpoint again starts another one.
>
> Try this:
>
> \startluacode
> local max_length = 112
>
> local glyph_id = node.id "glyph"
> local disc_id = node.id "disc"
> local glue_id = node.id "glue"
>
> function userdata.limiter(head)
> language.hyphenate(head)
>
> local hyphen = node.new "glyph"
> hyphen.char = language.prehyphenchar(0)
> hyphen.font = font.current()
> local width = hyphen.width
> node.free(hyphen)
>
> local chars = 0
> local n = head
> while n do
> if n.id == glyph_id or n.id == glue_id then
> chars = chars + 1
> width = width + n.width - (n.shrink or 0)
> end
>
> if chars >= max_length or width > tex.hsize then
> local back_chars = 0
> local end_disc = nil
>
> while n do
> if n.id == glue_id then
> local penalty = node.new "penalty"
> penalty.penalty = -10000
> node.insertbefore(head, n, penalty)
> break
> end
>
> if not end_disc and n.id == disc_id then
> end_disc = n
> end
>
> if end_disc and back_chars >= 5 then
> end_disc.penalty = -10000
> break
> end
>
> if n.id == glyph_id then
> back_chars = back_chars + 1
> end
>
> n = n.prev
> end
>
> width = 0
> chars = 0
> end
>
> n = n.next
> end
>
> return head
> end
>
> nodes.tasks.appendaction(
> "processors",
> "before",
> "userdata.limiter"
> )
> \stopluacode
>
> I've just added the width of a hyphen to the accumulated width. Let me know if this works; if not, there's a more complex fix that I can try.
>
>> Unfortunately, I don't know what to change; I know a bit about "glyph" and "glue", but what is "disc" and would it help here?
>
> "disc" nodes are "discretionaries", which are usually potential hyphens. See "The TeXbook" (page 95) or "TeX by Topic" (https://texdoc.org/serve/texbytopic/0#subsection.19.3.1) for details on the TeX side, or the LuaMetaTeX manual (https://www.pragma-ade.com/general/manuals/luametatex.pdf#%231205) for details on the Lua side.
>
> -- Max
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://context.aanhet.net
archive : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___________________________________________________________________________________
next prev parent reply other threads:[~2022-06-26 15:59 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-24 3:15 Benjamin Buchmuller via ntg-context
2022-06-24 5:44 ` Max Chernoff via ntg-context
2022-06-25 15:38 ` Benjamin Buchmuller via ntg-context
2022-06-25 20:25 ` Benjamin Buchmuller via ntg-context
2022-06-25 21:40 ` Max Chernoff via ntg-context
2022-06-26 15:59 ` Benjamin Buchmuller via ntg-context [this message]
2022-06-26 22:32 ` Max Chernoff via ntg-context
2022-06-27 9:33 ` Hans Hagen via ntg-context
2022-07-18 21:24 ` Benjamin Buchmuller via ntg-context
2022-07-26 17:40 ` Hans Hagen via ntg-context
2022-06-26 17:11 ` Hans Hagen via ntg-context
2022-06-26 8:28 ` Hans Hagen via ntg-context
2022-06-24 7:31 ` Henning Hraban Ramm via ntg-context
2022-06-24 17:58 ` Hans Hagen via ntg-context
2022-06-26 19:26 Benjamin Buchmuller via ntg-context
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=FC9CC7C3-631B-4C57-8DD4-556EBEA93ED4@gmail.com \
--to=ntg-context@ntg.nl \
--cc=benjamin.buchmuller@gmail.com \
--cc=mseven@telus.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).