From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/114187 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Denis Maier via ntg-context Newsgroups: gmane.comp.tex.context Subject: Re: XML, dealing with whitespace Date: Mon, 17 Jan 2022 09:47:17 +0000 Message-ID: <8a0332f6029b44389a02cb28f0cfe8ea@unibe.ch> References: <02468648b859479c9f5bbcce7001e88b@unibe.ch> Reply-To: mailing list for ConTeXt users Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7587004817475680394==" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="21423"; mail-complaints-to="usenet@ciao.gmane.io" Cc: denis.maier@unibe.ch To: , , Original-X-From: ntg-context-bounces@ntg.nl Mon Jan 17 10:48:02 2022 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane-mx.org Original-Received: from zapf.boekplan.nl ([5.39.185.232] helo=zapf.ntg.nl) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1n9OcM-0005OV-Kd for gctc-ntg-context-518@m.gmane-mx.org; Mon, 17 Jan 2022 10:48:02 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id 733CA28597B; Mon, 17 Jan 2022 10:47:25 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at zapf.boekplan.nl Original-Received: from zapf.ntg.nl ([127.0.0.1]) by localhost (zapf.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oUhhOPqvgF2q; Mon, 17 Jan 2022 10:47:23 +0100 (CET) Original-Received: from zapf.ntg.nl (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id AD16C283A5E; Mon, 17 Jan 2022 10:47:23 +0100 (CET) Original-Received: from localhost (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id 1AC86283A5E for ; Mon, 17 Jan 2022 10:47:22 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at zapf.boekplan.nl Original-Received: from zapf.ntg.nl ([127.0.0.1]) by localhost (zapf.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8VCugiVUrxqu for ; Mon, 17 Jan 2022 10:47:19 +0100 (CET) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=130.92.0.83; helo=mailhub-lb2.unibe.ch; envelope-from=denis.maier@unibe.ch; receiver= Original-Received: from mailhub-lb2.unibe.ch (mailhub-lb2.unibe.ch [130.92.0.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by zapf.ntg.nl (Postfix) with ESMTPS id D76982815BF for ; Mon, 17 Jan 2022 10:47:19 +0100 (CET) X-Virus-Scanned: By University of Bern - MGW Original-Received: from mail.campus.unibe.ch (aai-edge-01.campus.unibe.ch [130.92.13.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mailhub-lb2.unibe.ch (Postfix) with ESMTPS id 65F57500081; Mon, 17 Jan 2022 10:47:18 +0100 (CET) Thread-Topic: [NTG-context] XML, dealing with whitespace Thread-Index: AdgKBP38JvylQYv8SU6fTGRrjlKD6gAOKHwAAFGeulA= In-Reply-To: Accept-Language: de-CH, en-US Content-Language: de-DE x-originating-ip: [130.92.13.161] X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.26 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ntg-context-bounces@ntg.nl Original-Sender: "ntg-context" Xref: news.gmane.io gmane.comp.tex.context:114187 Archived-At: --===============7587004817475680394== Content-Language: de-DE Content-Type: multipart/alternative; boundary="_000_8a0332f6029b44389a02cb28f0cfe8eaunibech_" --_000_8a0332f6029b44389a02cb28f0cfe8eaunibech_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Wolfgang, Von: Wolfgang Schuster Gesendet: Samstag, 15. Januar 2022 20:28 An: mailing list for ConTeXt users ; Denis Maier via nt= g-context Cc: Maier, Denis Christian (UB) Betreff: Re: [NTG-context] XML, dealing with whitespace Denis Maier via ntg-context schrieb am 15.01.2022 um 13:04: Hi all, I have sources that look like this: %%%%%%%%%%%%%%%%%%%%%

Bla Bla Bla

Bla , Bla Bla.

%%%%%%%%%%%%%%%%%%%%% Typesetting this with context gives me a spurious space after the underline= d Bla in italics. There is no spurious space, the line break is just converted to a space and= I see no reason why this shouldn't happen. To remove space before or after= certain parts of text within a paragraph you can use the \removeunwantedsp= ace and \ignorespaces commands. Yes, it's absolutely true. From tex's point of view, the space is not spuri= ous. It's absolutely adaquate to treat the newline as a space here. As I've outlined in my original post the problem occurs because xslt adds t= hese indentations here. FWIW, I finally found this solution, which seems ha= s been added to xslt 3.0 (after being available as a saxon extension: there= 's a new attribute on xsl:output that can be used to= control this: (https://www.saxonica.com/documentation9.5/xsl-elements/output.html) So, the solution to my problem is this: Denis %%%% begin example \starttexdefinition RemovePreceding #1 \removeunwantedspaces #1 \stoptexdefinition \starttexdefinition RemoveFollowing #1 #1 \ignorespaces \stoptexdefinition \starttext Bla \RemovePreceding{Bla} Bla Bla \RemoveFollowing{Bla} Bla \stoptext %%%% end example When only following spaces are a problem a better alternative to \ignoresp= ace is \autoinsertnextspace which checks the following token which ensures = there is space when the next character is punctuation. %%%% begin example \starttexdefinition Italic #1 \emphasized{#1} \autoinsertnextspace \stoptexdefinition \starttexdefinition Underbar #1 \underbar{#1} \stoptexdefinition \starttext Bla Bla Bla \Underbar{\Italic{Bla} , Bla Bla.} \stoptext %%%% end example Wolfgang --_000_8a0332f6029b44389a02cb28f0cfe8eaunibech_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi Wolfgang,

 

Von: Wolfgang Schuster <wolfgang.schuster= .lists@gmail.com>
Gesendet: Samstag, 15. Januar 2022 20:28
An: mailing list for ConTeXt users <ntg-context@ntg.nl>; Denis= Maier via ntg-context <ntg-context@ntg.nl>
Cc: Maier, Denis Christian (UB) <denis.maier@unibe.ch>
Betreff: Re: [NTG-context] XML, dealing with whitespace

 

Denis Maier via ntg-context schrieb am 15.01.2022 um= 13:04:

Hi all,

 

I have sources that look like t= his:

 

%%%%%%%%%%%%%%%%%%%%%

<?xml version=3D"1.0&qu= ot; encoding=3D"UTF-8"?>

<article>

   <p>Bla Bla B= la</p>

   <p>

      <underline>

         <= ;italic>Bla</italic>

      </underline>, B= la Bla.</p>

</article>

%%%%%%%%%%%%%%%%%%%%%

 

Typesetting this with context g= ives me a spurious space after the underlined Bla in italics.


There is no spurious space, the line break is just converted to a space and= I see no reason why this shouldn't happen. To remove space before or after= certain parts of text within a paragraph you can use the \removeunwantedsp= ace and \ignorespaces commands.

 

Yes, it’s absolutely true. From tex’s po= int of view, the space is not spurious. It’s absolutely adaquate to t= reat the newline as a space here.

 

As I’ve outlined in my original post the problem occurs becau= se xslt adds these indentations here. FWIW, I finally found this solution, = which seems has been added to xslt 3.0 (after being available as a saxon extension: there's a new attribute &laqu= o;suppress-indentation» on xsl:output that can be used to control thi= s:
«This is a new property in XSLT 3.0 (it was previously available in S= axon as an extension). The value is a whitespace-separated list of element = names, and it typically identifies "inline" elements that should = not cause indentation; in XHTML, for example, these would be b, i, span, and the like.&ra= quo;

(https://www.saxonica.com/documentation9.5/xsl-e= lements/output.html)

 

So, the solution to my problem is this:

 

<xsl:output
    method=3D"xml"
    indent=3D"yes"
    suppress-indentation=3D"italic underline"
    />

 

Denis

 



%%%% begin example
\starttexdefinition RemovePreceding #1
    \removeunwantedspaces
    #1
\stoptexdefinition

\starttexdefinition RemoveFollowing #1
    #1
    \ignorespaces
\stoptexdefinition

\starttext

Bla \RemovePreceding{Bla} Bla

Bla \RemoveFollowing{Bla} Bla

\stoptext
%%%% end example

When only  following spaces are a problem a better alternative to \ign= orespace is \autoinsertnextspace which checks the following token which ens= ures there is space when the next character is punctuation.

%%%% begin example
\starttexdefinition Italic #1
    \emphasized{#1}
    \autoinsertnextspace
\stoptexdefinition

\starttexdefinition Underbar #1
    \underbar{#1}
\stoptexdefinition

\starttext

Bla Bla Bla

\Underbar{\Italic{Bla} , Bla Bla.}

\stoptext
%%%% end example

Wolfgang

--_000_8a0332f6029b44389a02cb28f0cfe8eaunibech_-- --===============7587004817475680394== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX18KSWYgeW91ciBxdWVzdGlvbiBpcyBvZiBpbnRlcmVz dCB0byBvdGhlcnMgYXMgd2VsbCwgcGxlYXNlIGFkZCBhbiBlbnRyeSB0byB0aGUgV2lraSEKCm1h aWxsaXN0IDogbnRnLWNvbnRleHRAbnRnLm5sIC8gaHR0cDovL3d3dy5udGcubmwvbWFpbG1hbi9s aXN0aW5mby9udGctY29udGV4dAp3ZWJwYWdlICA6IGh0dHA6Ly93d3cucHJhZ21hLWFkZS5ubCAv IGh0dHA6Ly9jb250ZXh0LmFhbmhldC5uZXQKYXJjaGl2ZSAgOiBodHRwczovL2JpdGJ1Y2tldC5v cmcvcGhnL2NvbnRleHQtbWlycm9yL2NvbW1pdHMvCndpa2kgICAgIDogaHR0cDovL2NvbnRleHRn YXJkZW4ubmV0Cl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCg== --===============7587004817475680394==--