ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Unexpected space after hyphen in xml/html export
@ 2018-10-06 22:19 Rik Kabel
  2018-10-06 23:28 ` Hans Hagen
  0 siblings, 1 reply; 7+ messages in thread
From: Rik Kabel @ 2018-10-06 22:19 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 2188 bytes --]

List,

Occasionally an unexpected and unwanted space is inserted following the 
hyphen of a compound word in html/xml exports. In a document with about 
500 such compounds, this occurs 30 times.

The following input:

    \setupbackend     [export=yes,xhtml=yes]
    \starttext
    Theocracy, the priest power; monarchy, the one|-|man power; and
    oligarchy, the few|-|men power|—|are three forms of vicarious
    government over the people, perhaps for them, not by them. Democracy is
    direct self|-|government over all the people, for all the people, by
    all the people. Our institutions are democratic: theocratic, monarchic,
    oligarchic vicariousness is all gone. We have no Divine vicar who is
    responsible to God for our politics and religion; only a human attorney,
    answerable to the people for his official work. The axis of rotation has
    changed: the equator of the old civilization passes through the poles
    of the new. This makes some change in the geography of both Church and
    State.
    \stopsection
    \stoptext

Produces, in relevant part, the following xml (wrapped for convenience):

    Theocracy, the priest power; monarchy, the one-man power; and oligarchy,
    the few- men power—are three forms of vicarious government over
    the people, perhaps for them, not by them. Democracy is direct
    self-government over all the people, for all the people, by all the
    people. Our institutions are democratic: theocratic, monarchic,
    oligarchic vicariousness is all gone. We have no Divine vicar who is
    responsible to God for our politics and religion; only a human attorney,
    answerable to the people for his official work. The axis of rotation has
    changed: the equator of the old civilization passes through the poles
    of the new. This makes some change in the geography of both Church and
    State.</document>

Note the space after "few-" in the second line of the output text.

(The paragraph is a quotation from Theodore Parker's sermon "The Effect 
of Slavery on the American People," delivered on July 4, 1858. It is 
thought by many to be the inspiration for part of Lincoln's Gettysburg 
Address.)

-- 
Rik


[-- Attachment #1.2: Type: text/html, Size: 2488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 492 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unexpected space after hyphen in xml/html export
  2018-10-06 22:19 Unexpected space after hyphen in xml/html export Rik Kabel
@ 2018-10-06 23:28 ` Hans Hagen
  2018-10-08 20:24   ` Rik Kabel
  0 siblings, 1 reply; 7+ messages in thread
From: Hans Hagen @ 2018-10-06 23:28 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Rik Kabel

On 10/7/2018 12:19 AM, Rik Kabel wrote:
> List,
> 
> Occasionally an unexpected and unwanted space is inserted following the 
> hyphen of a compound word in html/xml exports. In a document with about 
> 500 such compounds, this occurs 30 times.
> 
> The following input:
> 
>     \setupbackend     [export=yes,xhtml=yes]
>     \starttext
>     Theocracy, the priest power; monarchy, the one|-|man power; and
>     oligarchy, the few|-|men power|—|are three forms of vicarious
>     government over the people, perhaps for them, not by them. Democracy is
>     direct self|-|government over all the people, for all the people, by
>     all the people. Our institutions are democratic: theocratic, monarchic,
>     oligarchic vicariousness is all gone. We have no Divine vicar who is
>     responsible to God for our politics and religion; only a human attorney,
>     answerable to the people for his official work. The axis of rotation has
>     changed: the equator of the old civilization passes through the poles
>     of the new. This makes some change in the geography of both Church and
>     State.
>     \stopsection
>     \stoptext
> 
> Produces, in relevant part, the following xml (wrapped for convenience):
> 
>     Theocracy, the priest power; monarchy, the one-man power; and oligarchy,
>     the few- men power—are three forms of vicarious government over
>     the people, perhaps for them, not by them. Democracy is direct
>     self-government over all the people, for all the people, by all the
>     people. Our institutions are democratic: theocratic, monarchic,
>     oligarchic vicariousness is all gone. We have no Divine vicar who is
>     responsible to God for our politics and religion; only a human attorney,
>     answerable to the people for his official work. The axis of rotation has
>     changed: the equator of the old civilization passes through the poles
>     of the new. This makes some change in the geography of both Church and
>     State.</document>
> 
> Note the space after "few-" in the second line of the output text.
> 
> (The paragraph is a quotation from Theodore Parker's sermon "The Effect 
> of Slavery on the American People," delivered on July 4, 1858. It is 
> thought by many to be the inspiration for part of Lincoln's Gettysburg 
> Address.)

But it's not what happened: quite some folks in power have middle age 
monarchic characteristics, oligarchies are around etc. Old institutions 
(that probably root deeply in mankind0 are just better in pretending to 
be different.

Anyway fixed in next beta (but you need to keep an eye on disc side 
effects.

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unexpected space after hyphen in xml/html export
  2018-10-06 23:28 ` Hans Hagen
@ 2018-10-08 20:24   ` Rik Kabel
  2018-10-08 22:32     ` Hans Hagen
  0 siblings, 1 reply; 7+ messages in thread
From: Rik Kabel @ 2018-10-08 20:24 UTC (permalink / raw)
  To: Hans Hagen, mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 4079 bytes --]

On 10/6/2018 19:28, Hans Hagen wrote:
> On 10/7/2018 12:19 AM, Rik Kabel wrote:
>> List,
>>
>> Occasionally an unexpected and unwanted space is inserted following 
>> the hyphen of a compound word in html/xml exports. In a document with 
>> about 500 such compounds, this occurs 30 times.
>>
>> The following input:
>>
>>     \setupbackend     [export=yes,xhtml=yes]
>>     \starttext
>>     Theocracy, the priest power; monarchy, the one|-|man power; and
>>     oligarchy, the few|-|men power|—|are three forms of vicarious
>>     government over the people, perhaps for them, not by them. 
>> Democracy is
>>     direct self|-|government over all the people, for all the people, by
>>     all the people. Our institutions are democratic: theocratic, 
>> monarchic,
>>     oligarchic vicariousness is all gone. We have no Divine vicar who is
>>     responsible to God for our politics and religion; only a human 
>> attorney,
>>     answerable to the people for his official work. The axis of 
>> rotation has
>>     changed: the equator of the old civilization passes through the 
>> poles
>>     of the new. This makes some change in the geography of both 
>> Church and
>>     State.
>>     \stopsection
>>     \stoptext
>>
>> Produces, in relevant part, the following xml (wrapped for convenience):
>>
>>     Theocracy, the priest power; monarchy, the one-man power; and 
>> oligarchy,
>>     the few- men power—are three forms of vicarious government over
>>     the people, perhaps for them, not by them. Democracy is direct
>>     self-government over all the people, for all the people, by all the
>>     people. Our institutions are democratic: theocratic, monarchic,
>>     oligarchic vicariousness is all gone. We have no Divine vicar who is
>>     responsible to God for our politics and religion; only a human 
>> attorney,
>>     answerable to the people for his official work. The axis of 
>> rotation has
>>     changed: the equator of the old civilization passes through the 
>> poles
>>     of the new. This makes some change in the geography of both 
>> Church and
>>     State.</document>
>>
>> Note the space after "few-" in the second line of the output text.
>>
>> (The paragraph is a quotation from Theodore Parker's sermon "The 
>> Effect of Slavery on the American People," delivered on July 4, 1858. 
>> It is thought by many to be the inspiration for part of Lincoln's 
>> Gettysburg Address.)
>
> But it's not what happened: quite some folks in power have middle age 
> monarchic characteristics, oligarchies are around etc. Old 
> institutions (that probably root deeply in mankind0 are just better in 
> pretending to be different.
>
> Anyway fixed in next beta (but you need to keep an eye on disc side 
> effects.
>
> Hans
Alas, it is fixed for that particular occurence, but it still occurs 29 
times in the document (using today's beta).

A more extended search shows that there are also spaces afters en-dashes 
(in "Press|–|Citizen" and  in "Miniatur|–|Bibliothek der Deutschen 
Classiker"), but none after em-dashes. Unfortunately, my attempts to 
reproduce this in a smaller document have so far failed.

Perhaps this quote, in which the problem also occurs, is in line with 
your other comments:

    There is only one party in the United States, the Property
    Party\nbsp \dots{} and it has two right wings: Republican
    and Democrat. Republicans are a bit stupider, more rigid,
    more doctrinaire in their laissez|-|faire capitalism than
    the Democrats, who are cuter, prettier, a bit more
    corrupt—until recently\nbsp \dots{} and more willing than the
    Republicans to make small adjustments when the poor, the black,
    the anti|-|imperialists get out of hand. But, essentially, there
    is no difference between the two parties.

(That is from Gore Vidal in 1975. Plus ça change.) In it, I get a space 
after "anti-".

But more like this and folks will complain about politics on the list. 
Or worse, encourage it.
-- 
Rik

[-- Attachment #1.2: Type: text/html, Size: 5456 bytes --]

[-- Attachment #2: Type: text/plain, Size: 492 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unexpected space after hyphen in xml/html export
  2018-10-08 20:24   ` Rik Kabel
@ 2018-10-08 22:32     ` Hans Hagen
  2018-10-10 18:50       ` Rik Kabel
  0 siblings, 1 reply; 7+ messages in thread
From: Hans Hagen @ 2018-10-08 22:32 UTC (permalink / raw)
  To: Rik Kabel, mailing list for ConTeXt users


> Alas, it is fixed for that particular occurence, but it still occurs 29 
> times in the document (using today's beta).
> 
> A more extended search shows that there are also spaces afters en-dashes 
> (in "Press|–|Citizen" and  in "Miniatur|–|Bibliothek der Deutschen 
> Classiker"), but none after em-dashes. Unfortunately, my attempts to 
> reproduce this in a smaller document have so far failed.
well, you know: no mwe, no solution

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unexpected space after hyphen in xml/html export
  2018-10-08 22:32     ` Hans Hagen
@ 2018-10-10 18:50       ` Rik Kabel
  2018-10-10 19:11         ` Rik Kabel
  0 siblings, 1 reply; 7+ messages in thread
From: Rik Kabel @ 2018-10-10 18:50 UTC (permalink / raw)
  To: Hans Hagen, mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 1468 bytes --]

On 10/8/2018 18:32, Hans Hagen wrote:
>
>> Alas, it is fixed for that particular occurence, but it still occurs 
>> 29 times in the document (using today's beta).
>>
>> A more extended search shows that there are also spaces afters 
>> en-dashes (in "Press|–|Citizen" and  in "Miniatur|–|Bibliothek der 
>> Deutschen Classiker"), but none after em-dashes. Unfortunately, my 
>> attempts to reproduce this in a smaller document have so far failed.
> well, you know: no mwe, no solution
And here is the mwe. The culprit, it appears, is bidi. I have tried all 
documented options (but not all combinations) for \setupdirections, and 
the only one under which there is no problem is "off". With bidi active, 
there is a spurious space wherever a linebreak is introduced. As the 
example demonstrates, this is not a function of the compounds, but of 
hyphenation in general.

    \setupbackend     [export=yes]
    \setupdirections  [bidi=on]
    \starttext
    abraca% adjust to cause hyphenation with your textwidth
    abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra
    abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra
    abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra
    abra-cadabra abra-cadabra abra-cadabra abra-cadabra
    abra-cadabra abra-cadabra abra-cadabra abra-cadabra
    abra-cadabra abra-cadabra abra-cadabra abra-cadabra
    \stoptext

(The problem appears in the export html/xml file, not in the pdf.)
-- 
Rik

[-- Attachment #1.2: Type: text/html, Size: 1983 bytes --]

[-- Attachment #2: Type: text/plain, Size: 492 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unexpected space after hyphen in xml/html export
  2018-10-10 18:50       ` Rik Kabel
@ 2018-10-10 19:11         ` Rik Kabel
  2018-10-11 21:01           ` Rik Kabel
  0 siblings, 1 reply; 7+ messages in thread
From: Rik Kabel @ 2018-10-10 19:11 UTC (permalink / raw)
  To: Hans Hagen, mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 1688 bytes --]

On 10/10/2018 14:50, Rik Kabel wrote:
> On 10/8/2018 18:32, Hans Hagen wrote:
>>
>>> Alas, it is fixed for that particular occurence, but it still occurs 
>>> 29 times in the document (using today's beta).
>>>
>>> A more extended search shows that there are also spaces afters 
>>> en-dashes (in "Press|–|Citizen" and  in "Miniatur|–|Bibliothek der 
>>> Deutschen Classiker"), but none after em-dashes. Unfortunately, my 
>>> attempts to reproduce this in a smaller document have so far failed.
>> well, you know: no mwe, no solution
> And here is the mwe. The culprit, it appears, is bidi. I have tried 
> all documented options (but not all combinations) for 
> \setupdirections, and the only one under which there is no problem is 
> "off". With bidi active, there is a spurious space wherever a 
> linebreak is introduced. As the example demonstrates, this is not a 
> function of the compounds, but of hyphenation in general.
>
>     \setupbackend     [export=yes]
>     \setupdirections  [bidi=on]
>     \starttext
>     abraca% adjust to cause hyphenation with your textwidth
>     abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra
>     abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra
>     abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra
>     abra-cadabra abra-cadabra abra-cadabra abra-cadabra
>     abra-cadabra abra-cadabra abra-cadabra abra-cadabra
>     abra-cadabra abra-cadabra abra-cadabra abra-cadabra
>     \stoptext
>
> (The problem appears in the export html/xml file, not in the pdf.)
>
Not a function of explicit compounds (||) but of hyphenation of 
compounds. Normal hyphenation does not bring about the problem.

-- 
RIk


[-- Attachment #1.2: Type: text/html, Size: 2458 bytes --]

[-- Attachment #2: Type: text/plain, Size: 492 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Unexpected space after hyphen in xml/html export
  2018-10-10 19:11         ` Rik Kabel
@ 2018-10-11 21:01           ` Rik Kabel
  0 siblings, 0 replies; 7+ messages in thread
From: Rik Kabel @ 2018-10-11 21:01 UTC (permalink / raw)
  To: ntg-context, Hans Hagen


[-- Attachment #1.1: Type: text/plain, Size: 2028 bytes --]

On 10/10/2018 15:11, Rik Kabel wrote:
> On 10/10/2018 14:50, Rik Kabel wrote:
>> On 10/8/2018 18:32, Hans Hagen wrote:
>>>
>>>> Alas, it is fixed for that particular occurence, but it still 
>>>> occurs 29 times in the document (using today's beta).
>>>>
>>>> A more extended search shows that there are also spaces afters 
>>>> en-dashes (in "Press|–|Citizen" and  in "Miniatur|–|Bibliothek der 
>>>> Deutschen Classiker"), but none after em-dashes. Unfortunately, my 
>>>> attempts to reproduce this in a smaller document have so far failed.
>>> well, you know: no mwe, no solution
>> And here is the mwe. The culprit, it appears, is bidi. I have tried 
>> all documented options (but not all combinations) for 
>> \setupdirections, and the only one under which there is no problem is 
>> "off". With bidi active, there is a spurious space wherever a 
>> linebreak is introduced. As the example demonstrates, this is not a 
>> function of the compounds, but of hyphenation in general.
>>
>>     \setupbackend     [export=yes]
>>     \setupdirections  [bidi=on]
>>     \starttext
>>     abraca% adjust to cause hyphenation with your textwidth
>>     abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra
>>     abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra
>>     abra|-|cadabra abra|-|cadabra abra|-|cadabra abra|-|cadabra
>>     abra-cadabra abra-cadabra abra-cadabra abra-cadabra
>>     abra-cadabra abra-cadabra abra-cadabra abra-cadabra
>>     abra-cadabra abra-cadabra abra-cadabra abra-cadabra
>>     \stoptext
>>
>> (The problem appears in the export html/xml file, not in the pdf.)
>>
> Not a function of explicit compounds (||) but of hyphenation of 
> compounds. Normal hyphenation does not bring about the problem.
>
I also note that \setupdirection with every option combination I have 
tried has no discernible effect on my export output, and can safely be 
removed from the export mode of my document, so for me this issue 
disappears.
I do not know if this is the general case.
-- 
Rik

[-- Attachment #1.2: Type: text/html, Size: 3121 bytes --]

[-- Attachment #2: Type: text/plain, Size: 492 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-10-11 21:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-06 22:19 Unexpected space after hyphen in xml/html export Rik Kabel
2018-10-06 23:28 ` Hans Hagen
2018-10-08 20:24   ` Rik Kabel
2018-10-08 22:32     ` Hans Hagen
2018-10-10 18:50       ` Rik Kabel
2018-10-10 19:11         ` Rik Kabel
2018-10-11 21:01           ` Rik Kabel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).