ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* nbsp in XML (S01E01)
@ 2021-04-21 18:17 Jano Kula
  2021-04-21 18:37 ` Hans van der Meer
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jano Kula @ 2021-04-21 18:17 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 1881 bytes --]

Dear list,

first episode of series on nbsp of XML in lmtx.
Unfortunately, not that catchy as Netflix.

Used XML input has two types of non-breakable space:

   - unicode character
   - html entitity (in fact an ugly output of HTML editor)

HTML is preprocessed with ctx preprocessor (great feature!) and substituted
for unicode char nbsp or tilde.

MWE shows unichar spaces are non-breakable (see end of the first lines),
however they are not stretchable (see second line of the paragraphs).

Does unicode nbsp have fixed with in ctx?

When tilde is the replacement in preprocessor (uncomment first replacement
in preprocessor), xmlfush will display tilde (which is, as character,
non-breakable and unstretchable, no surprise).

Why tilde is displayed?

Replacing or adding nbsp (tilde) with finalizers have different results,
see next episode after this one is understood.

Thank you,
Jano

MWE (rather use attached file not to loose invisible characters):

\startbuffer[doc]
<?xml version "1.0"?>
<document>
        <p>Temperature 20 °C 20 °C 20 °C 20 °C average.</p>
        <p>Altitude 6000&amp;nbsp;m 6000&amp;nbsp;m 6000&amp;nbsp;m
6000&amp;nbsp;m average.</p>
</document>
\stopbuffer

\startluacode
function lxml.preprocessor(data)
    -- data = string.gsub(data, "&amp;nbsp;", "~")
    -- replacement nbsp invisible in luacode
    data = string.gsub(data, "&amp;nbsp;", " ")
    return data
end
\stopluacode


\startxmlsetups xml:name
    \xmlsetsetup{\xmldocument}{*}{-}
    \xmlsetsetup{\xmldocument}{document|p}{xml:name:*}
\stopxmlsetups
\xmlregistersetup{xml:name}

\startxmlsetups xml:name:document
\xmlflush{#1}\par
\stopxmlsetups

\startxmlsetups xml:name:p
\parfillskip0pt\xmlflush{#1}\par
\stopxmlsetups

\startTEXpage[offset=5mm,width=60mm]
\xmlprocessbuffer{xml:name}{doc}{}
\stopTEXpage

[-- Attachment #1.2: Type: text/html, Size: 2456 bytes --]

[-- Attachment #2: xml-and-space-preprocessor.tex --]
[-- Type: application/octet-stream, Size: 895 bytes --]

\startbuffer[doc]
<?xml version "1.0"?>
<document>
        <p>Temperature 20 °C 20 °C 20 °C 20 °C average.</p>
        <p>Altitude 6000&amp;nbsp;m 6000&amp;nbsp;m 6000&amp;nbsp;m 6000&amp;nbsp;m average.</p>
</document>
\stopbuffer

\startluacode
function lxml.preprocessor(data)
    -- data = string.gsub(data, "&amp;nbsp;", "~")
    -- replacement nbsp invisible in luacode
    data = string.gsub(data, "&amp;nbsp;", " ")
    return data
end
\stopluacode


\startxmlsetups xml:name
    \xmlsetsetup{\xmldocument}{*}{-}
    \xmlsetsetup{\xmldocument}{document|p}{xml:name:*}
\stopxmlsetups
\xmlregistersetup{xml:name}

\startxmlsetups xml:name:document
\xmlflush{#1}\par
\stopxmlsetups

\startxmlsetups xml:name:p
\parfillskip0pt\xmlflush{#1}\par
\stopxmlsetups

\startTEXpage[offset=5mm,width=60mm]
\xmlprocessbuffer{xml:name}{doc}{}
\stopTEXpage


[-- Attachment #3: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nbsp in XML (S01E01)
  2021-04-21 18:17 nbsp in XML (S01E01) Jano Kula
@ 2021-04-21 18:37 ` Hans van der Meer
  2021-04-21 21:09   ` Jano Kula
  2021-04-21 21:28 ` mf
  2021-04-22  9:36 ` Hans Hagen
  2 siblings, 1 reply; 8+ messages in thread
From: Hans van der Meer @ 2021-04-21 18:37 UTC (permalink / raw)
  To: NTG ConTeXt


[-- Attachment #1.1: Type: text/plain, Size: 2722 bytes --]

> Why tilde is displayed?

Wouldn't the simple answer not be: because XML is not TeX?

dr. Hans van der Meer


> On 21 Apr 2021, at 20:17, Jano Kula <jano.kula@gmail.com> wrote:
> 
> Dear list,
> 
> first episode of series on nbsp of XML in lmtx.
> Unfortunately, not that catchy as Netflix.
> 
> Used XML input has two types of non-breakable space:
> unicode character
> html entitity (in fact an ugly output of HTML editor)
> HTML is preprocessed with ctx preprocessor (great feature!) and substituted for unicode char nbsp or tilde.
> 
> MWE shows unichar spaces are non-breakable (see end of the first lines), however they are not stretchable (see second line of the paragraphs).
> 
> Does unicode nbsp have fixed with in ctx?
> 
> When tilde is the replacement in preprocessor (uncomment first replacement in preprocessor), xmlfush will display tilde (which is, as character, non-breakable and unstretchable, no surprise).
> 
> Why tilde is displayed?
> 
> Replacing or adding nbsp (tilde) with finalizers have different results, see next episode after this one is understood.
> 
> Thank you,
> Jano
> 
> MWE (rather use attached file not to loose invisible characters):
> 
> \startbuffer[doc]
> <?xml version "1.0"?>
> <document>
>         <p>Temperature 20 °C 20 °C 20 °C 20 °C average.</p>
>         <p>Altitude 6000&amp;nbsp;m 6000&amp;nbsp;m 6000&amp;nbsp;m 6000&amp;nbsp;m average.</p>
> </document>
> \stopbuffer
> 
> \startluacode
> function lxml.preprocessor(data)
>     -- data = string.gsub(data, "&amp;nbsp;", "~")
>     -- replacement nbsp invisible in luacode
>     data = string.gsub(data, "&amp;nbsp;", " ")
>     return data
> end
> \stopluacode
> 
> 
> \startxmlsetups xml:name
>     \xmlsetsetup{\xmldocument}{*}{-}
>     \xmlsetsetup{\xmldocument}{document|p}{xml:name:*}
> \stopxmlsetups
> \xmlregistersetup{xml:name}
> 
> \startxmlsetups xml:name:document
> \xmlflush{#1}\par
> \stopxmlsetups
> 
> \startxmlsetups xml:name:p
> \parfillskip0pt\xmlflush{#1}\par
> \stopxmlsetups
> 
> \startTEXpage[offset=5mm,width=60mm]
> \xmlprocessbuffer{xml:name}{doc}{}
> \stopTEXpage
> <xml-and-space-preprocessor.tex>___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
> 
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
> archive  : https://bitbucket.org/phg/context-mirror/commits/
> wiki     : http://contextgarden.net
> ___________________________________________________________________________________


[-- Attachment #1.2: Type: text/html, Size: 5886 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nbsp in XML (S01E01)
  2021-04-21 18:37 ` Hans van der Meer
@ 2021-04-21 21:09   ` Jano Kula
  2021-04-21 21:17     ` denis.maier
  2021-04-22  6:02     ` Taco Hoekwater
  0 siblings, 2 replies; 8+ messages in thread
From: Jano Kula @ 2021-04-21 21:09 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 604 bytes --]

On Wed, Apr 21, 2021 at 8:37 PM Hans van der Meer <havdmeer@ziggo.nl> wrote:

> Why tilde is displayed?
>
>
> Wouldn't the simple answer not be: because XML is not TeX?
>

I see your point for tilde: with finalizers in mind I was already in the
stomach, while mouth was looking at the menu. Teaser for S01E02:
finalizers.

I still would expect unicode nbsp to be expandable, otherwise I would have
to treat it somehow (no problem with that). Remember times when non
expandable/shrinkable nbsp was the first clue the book was typeset in Word?
I've checked it now and it's still the case.

Thank you,
Jano

[-- Attachment #1.2: Type: text/html, Size: 1091 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nbsp in XML (S01E01)
  2021-04-21 21:09   ` Jano Kula
@ 2021-04-21 21:17     ` denis.maier
  2021-04-22  6:02     ` Taco Hoekwater
  1 sibling, 0 replies; 8+ messages in thread
From: denis.maier @ 2021-04-21 21:17 UTC (permalink / raw)
  To: ntg-context

Re tilde: maybe the answer is in the entities section of the xml mkiv manual.

Denis



________________________________________
Von: ntg-context <ntg-context-bounces@ntg.nl> im Auftrag von Jano Kula <jano.kula@gmail.com>
Gesendet: Mittwoch, 21. April 2021 23:09:55
An: mailing list for ConTeXt users
Betreff: Re: [NTG-context] nbsp in XML (S01E01)

On Wed, Apr 21, 2021 at 8:37 PM Hans van der Meer <havdmeer@ziggo.nl<mailto:havdmeer@ziggo.nl>> wrote:
Why tilde is displayed?

Wouldn't the simple answer not be: because XML is not TeX?

I see your point for tilde: with finalizers in mind I was already in the stomach, while mouth was looking at the menu. Teaser for S01E02: finalizers.

I still would expect unicode nbsp to be expandable, otherwise I would have to treat it somehow (no problem with that). Remember times when non expandable/shrinkable nbsp was the first clue the book was typeset in Word? I've checked it now and it's still the case.

Thank you,
Jano
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nbsp in XML (S01E01)
  2021-04-21 18:17 nbsp in XML (S01E01) Jano Kula
  2021-04-21 18:37 ` Hans van der Meer
@ 2021-04-21 21:28 ` mf
  2021-04-22  9:36 ` Hans Hagen
  2 siblings, 0 replies; 8+ messages in thread
From: mf @ 2021-04-21 21:28 UTC (permalink / raw)
  To: ntg-context

Try this:

%\xmltexentity{nbsp}{\nobreakspace}

\xmlsetentity{nbsp}{ } % U+00A0 NBSP between braces

%\xmlsetentity{nbsp}{ } % U+0020 normal space between braces



\startbuffer[doc]

<?xml version "1.0"?>

<document>

         <p>Temperature 20 °C 20 °C 20 °C 20 °C average.</p>

         <p>Altitude 6000&nbsp;m 6000&nbsp;m 6000&nbsp;m 6000&nbsp;m 
average.</p>

</document>

\stopbuffer



\startluacode

--[[

function lxml.preprocessor(data)

     -- data = string.gsub(data, "&nbsp;", "~")

     -- replacement nbsp invisible in luacode

     data = string.gsub(data, "&nbsp;", " ")

     return data

end

--]]

\stopluacode





\startxmlsetups xml:name

     \xmlsetsetup{\xmldocument}{*}{-}

     \xmlsetsetup{\xmldocument}{document|p}{xml:name:*}

\stopxmlsetups

\xmlregistersetup{xml:name}



\startxmlsetups xml:name:document

\xmlflush{#1}\par

\stopxmlsetups



\startxmlsetups xml:name:p

\parfillskip0pt\xmlflush{#1}\par

\stopxmlsetups



\startTEXpage[offset=5mm,width=60mm]

\xmlprocessbuffer{xml:name}{doc}{}

\stopTEXpage




Massi

Il 21/04/21 20:17, Jano Kula ha scritto:
> Dear list,
> 
> first episode of series on nbsp of XML in lmtx.
> Unfortunately, not that catchy as Netflix.
> 
> Used XML input has two types of non-breakable space:
> 
>   * unicode character
>   * html entitity (in fact an ugly output of HTML editor)
> 
> HTML is preprocessed with ctx preprocessor (great feature!) and 
> substituted for unicode char nbsp or tilde.
> 
> MWE shows unichar spaces are non-breakable (see end of the first lines), 
> however they are not stretchable (see second line of the paragraphs).
> 
> Does unicode nbsp have fixed with in ctx?
> 
> When tilde is the replacement in preprocessor (uncomment first 
> replacement in preprocessor), xmlfush will display tilde (which is, as 
> character, non-breakable and unstretchable, no surprise).
> 
> Why tilde is displayed?
> 
> Replacing or adding nbsp (tilde) with finalizers have different results, 
> see next episode after this one is understood.
> 
> Thank you,
> Jano
> 
> MWE (rather use attached file not to loose invisible characters):
> 
> \startbuffer[doc]
> <?xml version "1.0"?>
> <document>
>          <p>Temperature 20 °C 20 °C 20 °C 20 °C average.</p>
>          <p>Altitude 6000&amp;nbsp;m 6000&amp;nbsp;m 6000&amp;nbsp;m 
> 6000&amp;nbsp;m average.</p>
> </document>
> \stopbuffer
> 
> \startluacode
> function lxml.preprocessor(data)
>      -- data = string.gsub(data, "&amp;nbsp;", "~")
>      -- replacement nbsp invisible in luacode
>      data = string.gsub(data, "&amp;nbsp;", " ")
>      return data
> end
> \stopluacode
> 
> 
> \startxmlsetups xml:name
>      \xmlsetsetup{\xmldocument}{*}{-}
>      \xmlsetsetup{\xmldocument}{document|p}{xml:name:*}
> \stopxmlsetups
> \xmlregistersetup{xml:name}
> 
> \startxmlsetups xml:name:document
> \xmlflush{#1}\par
> \stopxmlsetups
> 
> \startxmlsetups xml:name:p
> \parfillskip0pt\xmlflush{#1}\par
> \stopxmlsetups
> 
> \startTEXpage[offset=5mm,width=60mm]
> \xmlprocessbuffer{xml:name}{doc}{}
> \stopTEXpage
> 
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nbsp in XML (S01E01)
  2021-04-21 21:09   ` Jano Kula
  2021-04-21 21:17     ` denis.maier
@ 2021-04-22  6:02     ` Taco Hoekwater
  1 sibling, 0 replies; 8+ messages in thread
From: Taco Hoekwater @ 2021-04-22  6:02 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Hi,


> On 21 Apr 2021, at 23:09, Jano Kula <jano.kula@gmail.com> wrote:
> 
> On Wed, Apr 21, 2021 at 8:37 PM Hans van der Meer <havdmeer@ziggo.nl> wrote:
>> Why tilde is displayed?
> 
> Wouldn't the simple answer not be: because XML is not TeX?

You are never going back to “TeX mode”: the preprocessor converts XML into *other* XML. 
And tilde in XML is just that: the ascii tilde glyph.

> 
> I still would expect unicode nbsp to be expandable, 

I agree with that, but for fine-tuning XML output I would use a trick like this:

\startluacode
function lxml.preprocessor(data)
    return string.gsub(data, "&amp;nbsp;", "<nbsp/>")
end
\stopluacode

\startxmlsetups xml:name
    ...
    \xmlsetsetup{\xmldocument}{document|nbsp}{xml:name:*}
\stopxmlsetups

\startxmlsetups xml:name:nbsp
    \penalty10000\hskip .3em plus 2em % or something, just a wild example.
\stopxmlsetups

Using an xml element would also allow your code to ‘look around’ to make sure all is 
well with its (typesetting) environment.

Best wishes,
Taco

— 
Taco Hoekwater              E: taco@bittext.nl
genderfluid (all pronouns)



___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nbsp in XML (S01E01)
  2021-04-21 18:17 nbsp in XML (S01E01) Jano Kula
  2021-04-21 18:37 ` Hans van der Meer
  2021-04-21 21:28 ` mf
@ 2021-04-22  9:36 ` Hans Hagen
  2021-04-23 18:01   ` Jano Kula
  2 siblings, 1 reply; 8+ messages in thread
From: Hans Hagen @ 2021-04-22  9:36 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Jano Kula

On 4/21/2021 8:17 PM, Jano Kula wrote:

> Does unicode nbsp have fixed with in ctx?

sometimes ... but you just uncovered an old bug

     if attr >= 1 or attr <= 3 then -- flushright

someplace should be

     if attr >= 1 and attr <= 3 then -- flushright

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nbsp in XML (S01E01)
  2021-04-22  9:36 ` Hans Hagen
@ 2021-04-23 18:01   ` Jano Kula
  0 siblings, 0 replies; 8+ messages in thread
From: Jano Kula @ 2021-04-23 18:01 UTC (permalink / raw)
  To: Hans Hagen; +Cc: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 1570 bytes --]

Hello,

the first episode was more dramatic than expected, seem to be a good series.

On Thu, Apr 22, 2021 at 11:36 AM Hans Hagen <j.hagen@xs4all.nl> wrote:

> On 4/21/2021 8:17 PM, Jano Kula wrote:
>
> > Does unicode nbsp have fixed with in ctx?
>
> sometimes ... but you just uncovered an old bug
>      if attr >= 1 or attr <= 3 then -- flushright
> someplace should be
>      if attr >= 1 and attr <= 3 then -- flushright


After the patch, nbsp is working as expected.

On Thu, Apr 22, 2021 at 8:03 AM Taco Hoekwater <taco@bittext.nl> wrote:

the preprocessor converts XML into *other* XML.


Useful information, thanks, wikified.

And tilde in XML is just that: the ascii tilde glyph.


Yep, but \xmlfilter can process them nicely. See some next episode.


> for fine-tuning XML output I would use a trick like this:
>
> \startluacode
> function lxml.preprocessor(data)
>     return string.gsub(data, "&amp;nbsp;", "<nbsp/>")
> end
> \stopluacode
>
> \startxmlsetups xml:name
>     \xmlsetsetup{\xmldocument}{document|nbsp}{xml:name:*}
> \stopxmlsetups
>
> \startxmlsetups xml:name:nbsp
>     \penalty10000\hskip .3em plus 2em % or something, just a wild example.
> \stopxmlsetups
>
> Using an xml element would also allow your code to ‘look around’ to make
> sure all is
> well with its (typesetting) environment.


It didn't occur to me to change it by preprocessor to the new xml elements.
You are right, one can even have more control.

Thank you all for your help,
Jano

And thanks for watching!

[-- Attachment #1.2: Type: text/html, Size: 3090 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-04-23 18:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-21 18:17 nbsp in XML (S01E01) Jano Kula
2021-04-21 18:37 ` Hans van der Meer
2021-04-21 21:09   ` Jano Kula
2021-04-21 21:17     ` denis.maier
2021-04-22  6:02     ` Taco Hoekwater
2021-04-21 21:28 ` mf
2021-04-22  9:36 ` Hans Hagen
2021-04-23 18:01   ` Jano Kula

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).