From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/111131 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Hans van der Meer Newsgroups: gmane.comp.tex.context Subject: Re: nbsp in XML (S01E01) Date: Wed, 21 Apr 2021 20:37:27 +0200 Message-ID: <6E3F7696-32FF-4CEF-A4B5-8CBAE8D2FE87@ziggo.nl> References: Reply-To: mailing list for ConTeXt users Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Content-Type: multipart/mixed; boundary="===============0308504216129619303==" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="16616"; mail-complaints-to="usenet@ciao.gmane.io" To: NTG ConTeXt Original-X-From: ntg-context-bounces@ntg.nl Wed Apr 21 20:37:56 2021 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane-mx.org Original-Received: from zapf.boekplan.nl ([5.39.185.232] helo=zapf.ntg.nl) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lZHjY-0004Df-9k for gctc-ntg-context-518@m.gmane-mx.org; Wed, 21 Apr 2021 20:37:56 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id 1E9D2282AA4; Wed, 21 Apr 2021 20:37:33 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at zapf.boekplan.nl Original-Received: from zapf.ntg.nl ([127.0.0.1]) by localhost (zapf.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OhjepyQDnUtI; Wed, 21 Apr 2021 20:37:30 +0200 (CEST) Original-Received: from zapf.ntg.nl (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id 4F829282ABF; Wed, 21 Apr 2021 20:37:30 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id 8FEED281430 for ; Wed, 21 Apr 2021 20:37:29 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at zapf.boekplan.nl Original-Received: from zapf.ntg.nl ([127.0.0.1]) by localhost (zapf.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7zmpqUkByZiQ for ; Wed, 21 Apr 2021 20:37:28 +0200 (CEST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=212.54.42.166; helo=smtpq3.tb.mail.iss.as9143.net; envelope-from=havdmeer@ziggo.nl; receiver= Original-Received: from smtpq3.tb.mail.iss.as9143.net (smtpq3.tb.mail.iss.as9143.net [212.54.42.166]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by zapf.ntg.nl (Postfix) with ESMTPS id A04BA280C9D for ; Wed, 21 Apr 2021 20:37:28 +0200 (CEST) Original-Received: from [212.54.42.137] (helo=smtp6.tb.mail.iss.as9143.net) by smtpq3.tb.mail.iss.as9143.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lZHj6-00010E-Gz for ntg-context@ntg.nl; Wed, 21 Apr 2021 20:37:28 +0200 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=ziggo.nl; s=202002corplgsmtpnl; h=To:Date:Subject:From; bh=wsNmwyjnKH/DArg9Rx99mLAlKkOO72r9nds45w7LcxM=; b=WmsHRkBfg1J+kd5gwTk/Iq18Kw 4lAWg5adtaSHufwow9wasSw7eNwttERrFGgQKssZN3pvkBb+um0Hp9c0pZqY75HgN7OUyrpfGsTng 27IHa3zBlLHmiMysychI4VhyWvx+6Mu8PX8+M0AvDf4IfBmwaoKvHVuMU3E//K/nbcTHlUtvARd+h s4SxRufR4GdYxoN0sBhBmGEJzLXIguTbiiISaTZvlvJRMVWmyHMVb2xGGDm5KSpGZAHEh+VChLLUP RjqSMhrJtDinp1yvlkT3IMtSRdjLADFniuTEnxXJ6nDf5S4AEeMMR68ah2RLivemyO0NvEmwQaicR x7LRqhUA==; Original-Received: from 84-106-134-200.cable.dynamic.v4.ziggo.nl ([84.106.134.200] helo=[192.168.178.20]) by smtp6.tb.mail.iss.as9143.net with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1lZHj5-006vWe-Pn for ntg-context@ntg.nl; Wed, 21 Apr 2021 20:37:27 +0200 In-Reply-To: X-Mailer: Apple Mail (2.3608.120.23.2.4) X-SourceIP: 84.106.134.200 X-Authenticated-Sender: havdmeer@ziggo.nl (via SMTP) X-Ziggo-spambar: / X-Ziggo-spamscore: 0.0 X-CNFS-Analysis: CMAE Analysis: v=2.4 cv=cMqDDnSN c=1 sm=1 tr=0 ts=608070e7 cx=a_exe a=wCstmS+ZHA3zSJXjQC+ubA==:17 a=9+rZDBEiDlHhcck0kWbJtElFXBc=:19 a=3YhXtTcJ-WEA:10 a=pGLkceISAAAA:8 a=MiNTnEJAAAAA:8 a=YEMqx4UAAAAA:8 a=ACQCx6kCAAAA:8 a=xtERp6CFAAAA:8 a=a3nu-2BBAAAA:8 a=vn2sti7V5wjfT4xYF1QA:9 a=QEXdDO2ut3YA:10 a=WIZkaz7Bl7C-WcIVXREA:9 a=42fEvSCNNQDe4lp7:21 a=_W_S_7VecoQA:10 a=LmrbSfiT3hecnSZifb5M:22 a=V0662LiR8DSfwiDagK97:22 a=Sab0UneHBzlWrQDlOuxD:22 a=ekCXXmE-vB8RPiJ3MEZb:22 X-Ziggo-Spam-Status: No X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.26 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ntg-context-bounces@ntg.nl Original-Sender: "ntg-context" Xref: news.gmane.io gmane.comp.tex.context:111131 Archived-At: --===============0308504216129619303== Content-Type: multipart/alternative; boundary="Apple-Mail=_915661BE-1276-4500-8D6D-0AEC48F700EC" --Apple-Mail=_915661BE-1276-4500-8D6D-0AEC48F700EC Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > Why tilde is displayed? Wouldn't the simple answer not be: because XML is not TeX? dr. Hans van der Meer > On 21 Apr 2021, at 20:17, Jano Kula wrote: >=20 > Dear list, >=20 > first episode of series on nbsp of XML in lmtx. > Unfortunately, not that catchy as Netflix. >=20 > Used XML input has two types of non-breakable space: > unicode character > html entitity (in fact an ugly output of HTML editor) > HTML is preprocessed with ctx preprocessor (great feature!) and = substituted for unicode char nbsp or tilde. >=20 > MWE shows unichar spaces are non-breakable (see end of the first = lines), however they are not stretchable (see second line of the = paragraphs). >=20 > Does unicode nbsp have fixed with in ctx? >=20 > When tilde is the replacement in preprocessor (uncomment first = replacement in preprocessor), xmlfush will display tilde (which is, as = character, non-breakable and unstretchable, no surprise). >=20 > Why tilde is displayed? >=20 > Replacing or adding nbsp (tilde) with finalizers have different = results, see next episode after this one is understood. >=20 > Thank you, > Jano >=20 > MWE (rather use attached file not to loose invisible characters): >=20 > \startbuffer[doc] > > >

Temperature 20 =C2=B0C 20 =C2=B0C 20 =C2=B0C 20 =C2=B0C = average.

>

Altitude 6000&nbsp;m 6000&nbsp;m 6000&nbsp;m = 6000&nbsp;m average.

>
> \stopbuffer >=20 > \startluacode > function lxml.preprocessor(data) > -- data =3D string.gsub(data, "&nbsp;", "~") > -- replacement nbsp invisible in luacode > data =3D string.gsub(data, "&nbsp;", " ") > return data > end > \stopluacode >=20 >=20 > \startxmlsetups xml:name > \xmlsetsetup{\xmldocument}{*}{-} > \xmlsetsetup{\xmldocument}{document|p}{xml:name:*} > \stopxmlsetups > \xmlregistersetup{xml:name} >=20 > \startxmlsetups xml:name:document > \xmlflush{#1}\par > \stopxmlsetups >=20 > \startxmlsetups xml:name:p > \parfillskip0pt\xmlflush{#1}\par > \stopxmlsetups >=20 > \startTEXpage[offset=3D5mm,width=3D60mm] > \xmlprocessbuffer{xml:name}{doc}{} > \stopTEXpage > = __________________________________________= _________________________________________ > If your question is of interest to others as well, please add an entry = to the Wiki! >=20 > maillist : ntg-context@ntg.nl / = http://www.ntg.nl/mailman/listinfo/ntg-context > webpage : http://www.pragma-ade.nl / http://context.aanhet.net > archive : https://bitbucket.org/phg/context-mirror/commits/ > wiki : http://contextgarden.net > = __________________________________________________________________________= _________ --Apple-Mail=_915661BE-1276-4500-8D6D-0AEC48F700EC Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
Why tilde is = displayed?

Wouldn't the simple answer not be: because XML is not = TeX?

dr. Hans van der Meer


On 21 Apr 2021, at 20:17, Jano Kula <jano.kula@gmail.com>= wrote:

Dear list,

first episode of series on nbsp of XML in lmtx.
Unfortunately, not that catchy as Netflix.

Used= XML input has two types of non-breakable space:
  • unicode = character
  • html entitity = (in fact an ugly output of HTML editor)
HTML is = preprocessed with ctx preprocessor (great feature!) and substituted for = unicode char nbsp or tilde.

MWE shows unichar spaces are = non-breakable (see end of the first lines), however they are not = stretchable (see second line of the paragraphs).

Does unicode nbsp have fixed with in = ctx?

When = tilde is the replacement in preprocessor (uncomment first replacement in = preprocessor), xmlfush will display tilde (which is, as character, = non-breakable and unstretchable, no surprise).

Why tilde is displayed?

Replacing or adding nbsp = (tilde) with finalizers have different results, see next episode after = this one is understood.

Thank you,
Jano

MWE (rather use attached file not to = loose invisible characters):

\startbuffer[doc]
<?xml version = "1.0"?>
<document>
    =     <p>Temperature 20 =C2=B0C 20 =C2=B0C = 20 =C2=B0C 20 =C2=B0C average.</p>
  =       <p>Altitude 6000&amp;nbsp;m = 6000&amp;nbsp;m 6000&amp;nbsp;m 6000&amp;nbsp;m = average.</p>
</document>
\stopbuffer

\startluacode
function lxml.preprocessor(data)
    = -- data =3D string.gsub(data, "&amp;nbsp;", "~")
  =   -- replacement nbsp invisible in luacode
  =   data =3D string.gsub(data, "&amp;nbsp;", " ")
    return data
end
\stopluacode


\startxmlsetups xml:name
    = \xmlsetsetup{\xmldocument}{*}{-}
    = \xmlsetsetup{\xmldocument}{document|p}{xml:name:*}
\stopxmlsetups
\xmlregistersetup{xml:name}

\startxmlsetups xml:name:document
\xmlflush{#1}\par
\stopxmlsetups

\startxmlsetups xml:name:p
\parfillskip0pt\xmlflush{#1}\par
\stopxmlsetups

\startTEXpage[offset=3D5mm,width=3D60mm]
\xmlprocessbuffer{xml:name}{doc}{}
\stopTEXpage
<xml-and-space-preprocessor.tex>______= __________________________________________________________________________= ___
If your question is of interest to others as well, = please add an entry to the Wiki!

maillist : = ntg-context@ntg.nl = / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : = https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
_______________________________________________________________= ____________________

= --Apple-Mail=_915661BE-1276-4500-8D6D-0AEC48F700EC-- --===============0308504216129619303== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX18KSWYgeW91ciBxdWVzdGlvbiBpcyBvZiBpbnRlcmVz dCB0byBvdGhlcnMgYXMgd2VsbCwgcGxlYXNlIGFkZCBhbiBlbnRyeSB0byB0aGUgV2lraSEKCm1h aWxsaXN0IDogbnRnLWNvbnRleHRAbnRnLm5sIC8gaHR0cDovL3d3dy5udGcubmwvbWFpbG1hbi9s aXN0aW5mby9udGctY29udGV4dAp3ZWJwYWdlICA6IGh0dHA6Ly93d3cucHJhZ21hLWFkZS5ubCAv IGh0dHA6Ly9jb250ZXh0LmFhbmhldC5uZXQKYXJjaGl2ZSAgOiBodHRwczovL2JpdGJ1Y2tldC5v cmcvcGhnL2NvbnRleHQtbWlycm9yL2NvbW1pdHMvCndpa2kgICAgIDogaHR0cDovL2NvbnRleHRn YXJkZW4ubmV0Cl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCg== --===============0308504216129619303==--