From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: from cgl.ntg.nl (Cgl.ntg.nl [5.39.185.202]) by inbox.vuxu.org (Postfix) with ESMTP id 0F8222162C for ; Wed, 28 Feb 2024 22:22:15 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by cgl.ntg.nl (Postfix) with ESMTP id B72F0484102 for ; Wed, 28 Feb 2024 22:22:13 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at cgl.ntg.nl Authentication-Results: cgl.ntg.nl (amavisd-new); dkim=fail (2048-bit key) reason="fail (message has been altered)" header.d=gmail.com Received: from cgl.ntg.nl ([127.0.0.1]) by localhost (cgl.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id e4Zpyln02NLZ for ; Wed, 28 Feb 2024 22:22:13 +0100 (CET) Received: from cgl.ntg.nl (localhost [127.0.0.1]) by cgl.ntg.nl (Postfix) with ESMTP id 2E59B48453C for ; Wed, 28 Feb 2024 22:19:23 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by cgl.ntg.nl (Postfix) with ESMTP id 21CA6483FFB for ; Wed, 28 Feb 2024 22:18:39 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at cgl.ntg.nl Received: from cgl.ntg.nl ([127.0.0.1]) by localhost (cgl.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rJqxHQ0QaPVV for ; Wed, 28 Feb 2024 22:18:38 +0100 (CET) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.208.51; helo=mail-ed1-f51.google.com; envelope-from=wolfgang.schuster.lists@gmail.com; receiver= Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com [209.85.208.51]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by cgl.ntg.nl (Postfix) with ESMTPS id 56B12483FF9 for ; Wed, 28 Feb 2024 22:18:38 +0100 (CET) Received: by mail-ed1-f51.google.com with SMTP id 4fb4d7f45d1cf-565d1656c12so358141a12.1 for ; Wed, 28 Feb 2024 13:18:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709155117; x=1709759917; darn=ntg.nl; h=content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:from:to:cc:subject:date :message-id:reply-to; bh=iFQ6bfUEhOgX7hlxIvFC4sTHOA0int2NpZWKi6tyAmU=; b=nVEmDY44nkrlcDEVE5VjLo+U9A7Sr6JfM0sY/FI2jXqdDlVHyI2bseb/KmbXSbSQJr cMEe0HEm+u3Z/rcY7mE5Eqw0cP7pRD4SSDzHpUWhLakKyyWoL1gGDHam0dWPEzvxdix4 Eweo6ueGQpZDqFt39D19Qurjahh8OQw24VjKmzqnsiRcw2n2Pw1bu0otHoNCjRdXszH/ phAjIXivwKKXkCMoSlXXUofWhes6qOPgbo1y6TtFzdB0wNqMjnZ4T9OWtxYN2U0fZaOY Xvz71QFq4qELhT4TR88tA4ttG1qcz9KP80eBLCIFijI6ewzM+NKoAAv7scAmi0iIkTUx nXxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709155117; x=1709759917; h=content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=iFQ6bfUEhOgX7hlxIvFC4sTHOA0int2NpZWKi6tyAmU=; b=WxkSq5qntqILNBtE22qqXFDHzKUkoWP+PaTv5n4ccHksaQTwR9fSNMvDrhZWKe8qZz 5TW2N0vsMUO6igOZx7O6czNEkR9N76XcUb8lng8jPNEIV+bwu114cxICA71qgzxwgPS2 7QQUMyXhTsEbZQ0DKZkuDQw0kyYH8eHyoh4fDJYpXZ6eC2g0xnTCijJToRM3GomZRPxP RglzgenbOnu8mYDBpUJR5d4c+locAbvKLGPlBM9I3MYDpIJloI1Z9KfVsUprJHd5BjPi X8anjW06sfvVaUt1FgqPSwovm9lJFtdpQPPIUn0zg7UHgZrH1exq8148INTU6TAhfO3o eqpw== X-Gm-Message-State: AOJu0YxtdWX7GglbxO31BbMaCkRXo/ZgmO6Q/cJxNdQLGN1jcIdTZJfD i5a5jqmhKRboX8OdV2sg8SzSHxoLrVPwKA+JF2kUyPrAGFooWqiyKPbx4GQS X-Google-Smtp-Source: AGHT+IHVeWbQsoGO+iMABy0wp74oscwjFVB3U81m5G9sPKAg/hd0lw9fMjhuCNae5I1/619ISddNfA== X-Received: by 2002:aa7:cd06:0:b0:566:4555:1534 with SMTP id b6-20020aa7cd06000000b0056645551534mr315540edw.17.1709155117340; Wed, 28 Feb 2024 13:18:37 -0800 (PST) Received: from ?IPv6:2a02:810d:a8bf:dc10:496b:1b8e:4923:b718? ([2a02:810d:a8bf:dc10:496b:1b8e:4923:b718]) by smtp.gmail.com with ESMTPSA id fk25-20020a056402399900b00562149c7bf4sm2152544edb.48.2024.02.28.13.18.36 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 28 Feb 2024 13:18:36 -0800 (PST) To: mailing list for ConTeXt users , Emanuel Han via ntg-context References: <9bc7fee6-3275-4e7f-8343-c1477ecc14b7@free.fr> From: Wolfgang Schuster Message-ID: Date: Wed, 28 Feb 2024 22:18:35 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 PostboxApp/7.0.60 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US Message-ID-Hash: 37BNI3B355ZKZURVIFYNCEOVHWPN42GU X-Message-ID-Hash: 37BNI3B355ZKZURVIFYNCEOVHWPN42GU X-MailFrom: wolfgang.schuster.lists@gmail.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.8 Precedence: list Reply-To: mailing list for ConTeXt users Subject: [NTG-context] Re: Japanese List-Id: mailing list for ConTeXt users Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: multipart/mixed; boundary="===============4403601229556363817==" This is a multi-part message in MIME format. --===============4403601229556363817== Content-Type: multipart/alternative; boundary="------------74A23C6E0AEBC734896701F8" Content-Language: en-US This is a multi-part message in MIME format. --------------74A23C6E0AEBC734896701F8 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Emanuel Han via ntg-context schrieb am 28.02.2024 um 20:51: > Thank you all for your suggestions and contributions to the wiki. > > I don't intend to nag, but when looking at what ConTeXt is producing, > I need to state that the result is still far away from a properly > typeset Japanese text. > > So the nihongo script which comes with ConTeXt handles *line breaks / > line wrapping*. But the line break rules defined in it need a rework, > because they don't follow the standards. The standards are documented > here: > https://www.w3.org/TR/jlreq/#possibilities_for_linebreaking_between_characters > , and all affected characters are listed here: > https://www.w3.org/TR/jlreq/tables/table_en3.pdf > > We have different rules, depending what kind of character is > surpassing the text width (or is in its last position). > > Rule 1: > > Before closing brackets, closing quotation marks, iteration marks, the > Prolonged sound mark and small Kana, line breaking is prohibited. > > ’”)〕]}〉》」』】ヽヾゝゞ々ーぁぃぅぇぉァィゥェォっゃゅょッャュョ etc. > > The actual programmed behaviour by the nihongo script is that, if in > the position which exceeds the line width,  these characters jump to > the next line and take the previous character with them. If they're in > the last position of the line, they stay where they are. This > behaviour is correct. > > Rule 2: > > After opening Brackets and opening quotation marks, line breaking is > prohibited (but not before). > > ‘“(〔[{〈《「『【 > > The actual programmed behaviour by the nihongo script is that these > characters jump to the next line and take the previous character with > them. This behaviour is wrong. They should jump to the next line > without taking the previous character with them, just like any regular > character. The difference to a regular character is that they jump > already when still within the line length, and they're in the last > position of the line. The correct behaviour can be seen in LibreOffice > Writer in action. Can you provide a minimal example because this should be correct and if not it's a bug. > Rule 3: > Comma (tōten), full width comma, full stop > > 、,。 > > The actual programmed behaviour by the nihongo script is that, if in > the position which exceeds the line width, these characters jump to > the next line and take the previous character with them. This > behaviour is wrong. > They have to be put back to the end of the previous line, but beyond > the specified line length. (JIS Z 8125) (Search for "Line adjustment > by hanging punctuation" under https://www.w3.org/TR/jlreq/ ) > If they're in the last position of the line, they stay where they are. > The correct behaviour can be seen in LibreOffice Writer in action. This is handled by the protrusion mechanism and enabled with paragraph alignment. > Rules 4, 5, ...: > Combinations of inseparable characters... (see > https://www.w3.org/TR/jlreq/#possibilities_for_linebreaking_between_character > ) and eventually more, which I didn't test. > > It might be useful to define three scripts nihongo_loose, > nihongo_strict and nihongo_very_strict which each implement one of the > 3 cases described here: https://www.w3.org/TR/jlreq/#addendum_a > > According the *line gap* (Otared uses \setupwhitespace[big], which is > exceeding common line gaps), I'd like to quote from > https://www.w3.org/TR/jlreq/ : > > /It is common that the line gap for the kihon-hanmen is set to a value > between half-em spacing and the one em spacing of the character frame > used for the kihon-hanmen. Half-em spacing can be chosen in cases > where the line length is short, but one em spacing or close to it is > more appropriate when the line length is longer than 35 characters./ > > I like the standard line gap which is provided by ConTeXt, which is > equivalent to \setupwhitespace/[0pt]/. Even when using ruby, it works > well. I found the best voffset for ruby to be -1.7ex. The \setupwhitespace setting controls the distance between paragraphs but you're looking for the \setuplinespace command. > The *line adjustment* provided by ConTeXt by default is not meeting > the needs for Japanese  (and Chinese) text, which follow a grid > pattern. Especially the last line of a paragraph is squeezed, which is > "hurting the eye". > When characters need to jump to the next line due to previously > discussed line breaking rules, ConTeXt seems to apply "Line adjustment > by inter-character spacing expansion", which is a valid method > according to https://www.w3.org/TR/jlreq/#line_adjustment , although > "Line adjustment by inter-character spacing reduction" is preferred. > > The last point which ConTeXt is missing, when talking about Japanese > typesetting, is vertical writing. Vertical typesetting is possible but only for small text blocks which fit on a single page. Typesetting text which spans multiple pages isn't supported yet (it was possible ages ago with MkII) because nobody needed it yet. Wolfgang --------------74A23C6E0AEBC734896701F8 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit Emanuel Han via ntg-context schrieb am 28.02.2024 um 20:51:
Thank you all for your suggestions and contributions to the wiki.

I don't intend to nag, but when looking at what ConTeXt is producing, I need to state that the result is still far away from a properly typeset Japanese text.

So the nihongo script which comes with ConTeXt handles line breaks / line wrapping. But the line break rules defined in it need a rework, because they don't follow the standards. The standards are documented here: https://www.w3.org/TR/jlreq/#possibilities_for_linebreaking_between_characters , and all affected characters are listed here: https://www.w3.org/TR/jlreq/tables/table_en3.pdf

We have different rules, depending what kind of character is surpassing the text width (or is in its last position).

Rule 1:

Before closing brackets, closing quotation marks, iteration marks, the Prolonged sound mark and small Kana, line breaking is prohibited.

’”)〕]}〉》」』】ヽヾゝゞ々ーぁぃぅぇぉァィゥェォっゃゅょッャュョ etc.

The actual programmed behaviour by the nihongo script is that, if in the position which exceeds the line width,  these characters jump to the next line and take the previous character with them. If they're in the last position of the line, they stay where they are. This behaviour is correct.

Rule 2:

After opening Brackets and opening quotation marks, line breaking is prohibited (but not before).

‘“(〔[{〈《「『【

The actual programmed behaviour by the nihongo script is that these characters jump to the next line and take the previous character with them. This behaviour is wrong. They should jump to the next line without taking the previous character with them, just like any regular character. The difference to a regular character is that they jump already when still within the line length, and they're in the last position of the line. The correct behaviour can be seen in LibreOffice Writer in action.

Can you provide a minimal example because this should be correct and if not it's a bug.

Rule 3:
Comma (tōten), full width comma, full stop

、,。

The actual programmed behaviour by the nihongo script is that, if in the position which exceeds the line width, these characters jump to the next line and take the previous character with them. This behaviour is wrong. 
They have to be put back to the end of the previous line, but beyond the specified line length. (JIS Z 8125) (Search for "Line adjustment by hanging punctuation" under https://www.w3.org/TR/jlreq/ )
If they're in the last position of the line, they stay where they are. The correct behaviour can be seen in LibreOffice Writer in action.

This is handled by the protrusion mechanism and enabled with paragraph alignment.

Rules 4, 5, ...:
Combinations of inseparable characters... (see https://www.w3.org/TR/jlreq/#possibilities_for_linebreaking_between_character ) and eventually more, which I didn't test.

It might be useful to define three scripts nihongo_loose, nihongo_strict and nihongo_very_strict which each implement one of the 3 cases described here: https://www.w3.org/TR/jlreq/#addendum_a

According the line gap (Otared uses \setupwhitespace[big], which is exceeding common line gaps), I'd like to quote from https://www.w3.org/TR/jlreq/ :

It is common that the line gap for the kihon-hanmen is set to a value between half-em spacing and the one em spacing of the character frame used for the kihon-hanmen. Half-em spacing can be chosen in cases where the line length is short, but one em spacing or close to it is more appropriate when the line length is longer than 35 characters.

I like the standard line gap which is provided by ConTeXt, which is equivalent to \setupwhitespace[0pt]. Even when using ruby, it works well. I found the best voffset for ruby to be -1.7ex.

The \setupwhitespace setting controls the distance between paragraphs but you're looking for the \setuplinespace command.

The line adjustment provided by ConTeXt by default is not meeting the needs for Japanese  (and Chinese) text, which follow a grid pattern. Especially the last line of a paragraph is squeezed, which is "hurting the eye".
When characters need to jump to the next line due to previously discussed line breaking rules, ConTeXt seems to apply "Line adjustment by inter-character spacing expansion", which is a valid method according to https://www.w3.org/TR/jlreq/#line_adjustment , although "Line adjustment by inter-character spacing reduction" is preferred.

The last point which ConTeXt is missing, when talking about Japanese typesetting, is vertical writing.

Vertical typesetting is possible but only for small text blocks which fit on a single page. Typesetting text which spans multiple pages isn't supported yet (it was possible ages ago with MkII) because nobody needed it yet.

Wolfgang

--------------74A23C6E0AEBC734896701F8-- --===============4403601229556363817== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl webpage : https://www.pragma-ade.nl / https://context.aanhet.net (mirror) archive : https://github.com/contextgarden/context wiki : https://wiki.contextgarden.net ___________________________________________________________________________________ --===============4403601229556363817==--