From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: from cgl.ntg.nl (Cgl.ntg.nl [5.39.185.202]) by inbox.vuxu.org (Postfix) with ESMTP id 3B60D2113E for ; Fri, 1 Mar 2024 08:08:37 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by cgl.ntg.nl (Postfix) with ESMTP id 08AA3484573 for ; Fri, 1 Mar 2024 08:08:31 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at cgl.ntg.nl Authentication-Results: cgl.ntg.nl (amavisd-new); dkim=fail (2048-bit key) reason="fail (message has been altered)" header.d=gmail.com Received: from cgl.ntg.nl ([127.0.0.1]) by localhost (cgl.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id R5GdZi7Z0taD for ; Fri, 1 Mar 2024 08:08:30 +0100 (CET) Received: from cgl.ntg.nl (localhost [127.0.0.1]) by cgl.ntg.nl (Postfix) with ESMTP id 1ABC3484720 for ; Fri, 1 Mar 2024 08:05:23 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by cgl.ntg.nl (Postfix) with ESMTP id 76B3948440B for ; Fri, 1 Mar 2024 08:04:35 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at cgl.ntg.nl Received: from cgl.ntg.nl ([127.0.0.1]) by localhost (cgl.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Zvro0ZUJkrvU for ; Fri, 1 Mar 2024 08:04:33 +0100 (CET) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.215.182; helo=mail-pg1-f182.google.com; envelope-from=luigi.scarso@gmail.com; receiver= Received: from mail-pg1-f182.google.com (mail-pg1-f182.google.com [209.85.215.182]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by cgl.ntg.nl (Postfix) with ESMTPS id 16A8D484409 for ; Fri, 1 Mar 2024 08:04:33 +0100 (CET) Received: by mail-pg1-f182.google.com with SMTP id 41be03b00d2f7-5dc20645871so1209471a12.1 for ; Thu, 29 Feb 2024 23:04:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709276670; x=1709881470; darn=ntg.nl; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=NQx6ym0XIlCGrD7fp5C9GleyxUP4vNKhEH2z3hHDeRo=; b=dhyXppzKA6GTDGJfRri9bqZix77lYKvSMQpUEfddHQcKj6j48d2Sxe8hWTMAa3eefg TVlGgwFxEFP8MFpnRW78eiM4nvEg3dj0oUuU0v+VRjmCVdwEyAOM4buFoVU9SAF8rzwL 88FFBjnGnu43UGuA7jaIc5eT9Ml/QSS6OeIQUbmlvcsDFCAsCed2wTeiok42o+k8VI5h n176NbSvKK2OqFuOVMkF7j9qjmRps4cuHlFk1PJHXHwpB2kl/8I9vgeNnEKYQdaZ2mbU sQCrlNEbo5dpPTWLbeDcHwDj3io27LzBHvgt2zMYPL30kBkoBUExpV65ggHK4JA5SVyZ qpmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709276670; x=1709881470; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=NQx6ym0XIlCGrD7fp5C9GleyxUP4vNKhEH2z3hHDeRo=; b=DS6BtG9hWS9DxvfKRvNe88RYqgfZm8V9EF1AeTLSnOKShc2ZMOny/XrspkY6qU8jyp grIsuT9rizGFqRJrLMvGbGVtSxNpf555jtWm59CVOQrPc5IPexGdlyv43513SZDFmGkj Vf/0QrTk5vvaj6wkrXlFUjtIImqdzlKNWLV9Qia6vAVRz/0Vv+ptmSJJP4aueD9BKj9n /MVOJLGurDAaPjyUxLLYP1/ICwMcvZbKZnToET1vZ6yJbxAVSt69Eom+U2QlAev4pYlT Vcj1AHQKgYQIyXX//iEAxbJ5/FrINFMr34eUNUm4629+Gn+5nnI8y8JZIdS6RL4nD/tm WArg== X-Gm-Message-State: AOJu0YwE8tNSu83TByGXxkEUF7KgL9eNBdsLT6xv+q03h0QA4/IlKY3e IGr/WhCJQo86O7SrgNz7SgcNznjVlSAqu/2wjfOxGT3FGUhDRIM3kwy2+/Spzj0FoG4ckF23wJA UuuTHYu850UbOKH9i52uhjN18RaW3z4CgcZQ= X-Google-Smtp-Source: AGHT+IF10OFoa0Rp3LStlDN+Ven0e0ACaKIbC+Cr1XS8/GiDpXgvMmOuBWS2wJzs7VwVk8DkLp2u8lBz+yVblpbWlZE= X-Received: by 2002:a17:90a:4491:b0:299:3c2c:b680 with SMTP id t17-20020a17090a449100b002993c2cb680mr1883525pjg.15.1709276670037; Thu, 29 Feb 2024 23:04:30 -0800 (PST) MIME-Version: 1.0 References: <9bc7fee6-3275-4e7f-8343-c1477ecc14b7@free.fr> In-Reply-To: From: luigi scarso Date: Fri, 1 Mar 2024 08:04:18 +0100 Message-ID: To: mailing list for ConTeXt users Message-ID-Hash: ZCRD6XYNQPFAOCXBEMRTGBBBEY3R2YHG X-Message-ID-Hash: ZCRD6XYNQPFAOCXBEMRTGBBBEY3R2YHG X-MailFrom: luigi.scarso@gmail.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.8 Precedence: list Reply-To: mailing list for ConTeXt users Subject: [NTG-context] Re: Japanese List-Id: mailing list for ConTeXt users Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: multipart/mixed; boundary="===============1087439906597374333==" --===============1087439906597374333== Content-Type: multipart/alternative; boundary="000000000000bda253061293fcc0" --000000000000bda253061293fcc0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, 28 Feb 2024 at 20:53, Emanuel Han via ntg-context < ntg-context@ntg.nl> wrote: > Thank you all for your suggestions and contributions to the wiki. > > I don't intend to nag, but when looking at what ConTeXt is producing, I > need to state that the result is still far away from a properly typeset > Japanese text. > > So the nihongo script which comes with ConTeXt handles *line breaks / > line wrapping*. But the line break rules defined in it need a rework, > because they don't follow the standards. The standards are documented her= e: > https://www.w3.org/TR/jlreq/#possibilities_for_linebreaking_between_chara= cters > , and all affected characters are listed here: > https://www.w3.org/TR/jlreq/tables/table_en3.pdf > > We have different rules, depending what kind of character is surpassing > the text width (or is in its last position). > > Rule 1: > > Before closing brackets, closing quotation marks, iteration marks, the > Prolonged sound mark and small Kana, line breaking is prohibited. > > =E2=80=99=E2=80=9D=EF=BC=89=E3=80=95=EF=BC=BD=EF=BD=9D=E3=80=89=E3=80=8B= =E3=80=8D=E3=80=8F=E3=80=91=E3=83=BD=E3=83=BE=E3=82=9D=E3=82=9E=E3=80=85=E3= =83=BC=E3=81=81=E3=81=83=E3=81=85=E3=81=87=E3=81=89=E3=82=A1=E3=82=A3=E3=82= =A5=E3=82=A7=E3=82=A9=E3=81=A3=E3=82=83=E3=82=85=E3=82=87=E3=83=83=E3=83=A3= =E3=83=A5=E3=83=A7 etc. > > The actual programmed behaviour by the nihongo script is that, if in the > position which exceeds the line width, these characters jump to the next > line and take the previous character with them. If they're in the last > position of the line, they stay where they are. This behaviour is correct= . > > Rule 2: > > After opening Brackets and opening quotation marks, line breaking is > prohibited (but not before). > > =E2=80=98=E2=80=9C=EF=BC=88=E3=80=94=EF=BC=BB=EF=BD=9B=E3=80=88=E3=80=8A= =E3=80=8C=E3=80=8E=E3=80=90 > > The actual programmed behaviour by the nihongo script is that these > characters jump to the next line and take the previous character with the= m. > This behaviour is wrong. They should jump to the next line without taking > the previous character with them, just like any regular character. The > difference to a regular character is that they jump already when still > within the line length, and they're in the last position of the line. The > correct behaviour can be seen in LibreOffice Writer in action. > > Rule 3: > > Comma (t=C5=8Dten), full width comma, full stop > > =E3=80=81=EF=BC=8C=E3=80=82 > > The actual programmed behaviour by the nihongo script is that, if in the > position which exceeds the line width, these characters jump to the next > line and take the previous character with them. This behaviour is wrong. > They have to be put back to the end of the previous line, but beyond the > specified line length. (JIS Z 8125) (Search for "Line adjustment by hangi= ng > punctuation" under https://www.w3.org/TR/jlreq/ ) > If they're in the last position of the line, they stay where they are. Th= e > correct behaviour can be seen in LibreOffice Writer in action. > > Rules 4, 5, ...: > > Combinations of inseparable characters... (see > https://www.w3.org/TR/jlreq/#possibilities_for_linebreaking_between_chara= cter > ) and eventually more, which I didn't test. > > It might be useful to define three scripts nihongo_loose, nihongo_strict > and nihongo_very_strict which each implement one of the 3 cases described > here: https://www.w3.org/TR/jlreq/#addendum_a > > According the *line gap* (Otared uses \setupwhitespace[big], which is > exceeding common line gaps), I'd like to quote from > https://www.w3.org/TR/jlreq/ : > > *It is common that the line gap for the kihon-hanmen is set to a value > between half-em spacing and the one em spacing of the character frame use= d > for the kihon-hanmen. Half-em spacing can be chosen in cases where the li= ne > length is short, but one em spacing or close to it is more appropriate wh= en > the line length is longer than 35 characters.* > > I like the standard line gap which is provided by ConTeXt, which is > equivalent to \setupwhitespace*[0pt]*. Even when using ruby, it works > well. I found the best voffset for ruby to be -1.7ex. > > The *line adjustment* provided by ConTeXt by default is not meeting the > needs for Japanese (and Chinese) text, which follow a grid pattern. > Especially the last line of a paragraph is squeezed, which is "hurting th= e > eye". > > When characters need to jump to the next line due to previously discussed > line breaking rules, ConTeXt seems to apply "Line adjustment by > inter-character spacing expansion", which is a valid method according to > https://www.w3.org/TR/jlreq/#line_adjustment , although "Line adjustment > by inter-character spacing reduction" is preferred. > > The last point which ConTeXt is missing, when talking about Japanese > typesetting, is vertical writing. > > I know, this is a lot of work. Hopefully we can achieve with joint effort= s > to make ConTeXt Japanese ready. > > If I happen to have made false statements, please accept my apology. I > tried to be of help as far as I could. I grew up in Japan and know more o= r > less how typeset text should look like. > > Emanuel > > It would be nice if you can put your notes above into https://wiki.contextgarden.net/Chinese_Japanese_and_Korean or in general improve/maintain that page (e.g. the links about the fonts are broken at the moment) Perhaps with Jeong Dal ? Just to say, a few days ago I have seen https://ken-lunde.medium.com/genuine-han-unification-redux-3912b561ecae (only webp images, so a bit tricky to make a pdf) -- luigi --000000000000bda253061293fcc0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Wed, 28 Feb 2024 at 20:53, Emanuel= Han via ntg-context <ntg-context@= ntg.nl> wrote:
Thank you all for your suggestions and contributions to the wiki= .

I don't intend to nag, but when looking at what ConTeX= t is producing, I need to state that the result is still far away from a pr= operly typeset Japanese text.

So the nihongo script which co= mes with ConTeXt handles line breaks / line wrapping. But = the line break rules defined in it need a rework, because they don't fo= llow the standards. The standards are documented here: https://www.w3.org/TR/jlreq/#possibilities_for_linebreaking_b= etween_characters , and all affected characters are listed here: ht= tps://www.w3.org/TR/jlreq/tables/table_en3.pdf

We have di= fferent rules, depending what kind of character is surpassing the text widt= h (or is in its last position).

Rule 1:

Befor= e closing brackets, closing quotation marks, iteration marks, the Prolonged= sound mark and small Kana, line breaking is prohibited.

=E2= =80=99=E2=80=9D=EF=BC=89=E3=80=95=EF=BC=BD=EF=BD=9D=E3=80=89=E3=80=8B=E3=80= =8D=E3=80=8F=E3=80=91=E3=83=BD=E3=83=BE=E3=82=9D=E3=82=9E=E3=80=85=E3=83=BC= =E3=81=81=E3=81=83=E3=81=85=E3=81=87=E3=81=89=E3=82=A1=E3=82=A3=E3=82=A5=E3= =82=A7=E3=82=A9=E3=81=A3=E3=82=83=E3=82=85=E3=82=87=E3=83=83=E3=83=A3=E3=83= =A5=E3=83=A7 etc.

The actual programmed behaviour by the nih= ongo script is that, if in the position which exceeds the line width,=C2=A0= these characters jump to the next line and take the previous character wit= h them. If they're in the last position of the line, they stay where th= ey are. This behaviour is correct.

Rule 2:

Af= ter opening Brackets and opening quotation marks, line breaking is prohibit= ed (but not before).

=E2=80=98=E2=80=9C=EF=BC=88=E3=80=94=EF= =BC=BB=EF=BD=9B=E3=80=88=E3=80=8A=E3=80=8C=E3=80=8E=E3=80=90

The actual programmed behaviour by the nihongo script is that these charac= ters jump to the next line and take the previous character with them. This = behaviour is wrong. They should jump to the next line without taking the pr= evious character with them, just like any regular character. The difference= to a regular character is that they jump already when still within the lin= e length, and they're in the last position of the line. The correct beh= aviour can be seen in LibreOffice Writer in action.

Rule 3: =

Comma (t=C5=8Dten), full width comma, full stop

=E3=80=81=EF=BC=8C=E3=80=82

The actual programmed behaviour= by the nihongo script is that, if in the position which exceeds the line w= idth, these characters jump to the next line and take the previous characte= r with them. This behaviour is wrong.=C2=A0
They have to be put= back to the end of the previous line, but beyond the specified line length= . (JIS Z 8125) (Search for "Line adjustment by hanging punctuation&quo= t; under https:/= /www.w3.org/TR/jlreq/ )
If they're in the last position o= f the line, they stay where they are. The correct behaviour can be seen in = LibreOffice Writer in action.

Rules 4, 5, ...:<= /div>
Combinations of inseparable characters... (see https://www.w3.org/TR/jlreq/#possibilities_for_linebreak= ing_between_character ) and eventually more, which I didn't test.
It might be useful to define three scripts nihongo_loose, niho= ngo_strict and nihongo_very_strict which each implement one of the 3 cases = described here: https://www.w3.org/TR/jlreq/#addendum_a

Accor= ding the line gap=C2=A0(Otared uses \setupwhitespace[big],= which is exceeding common line gaps), I'd like to quote from https://www.w3.org/TR/j= lreq/ :

It is common that the line gap for the kihon-= hanmen is set to a value between half-em spacing and the one em spacing of = the character frame used for the kihon-hanmen. Half-em spacing can be chose= n in cases where the line length is short, but one em spacing or close to i= t is more appropriate when the line length is longer than 35 characters.

I like the standard line gap which is provided by ConTeXt,= which is equivalent to \setupwhitespace[0pt]. Even when using rub= y, it works well. I found the best voffset for ruby to be -1.7ex.

=
The line adjustment=C2=A0provided by ConTeXt by defau= lt is not meeting the needs for Japanese=C2=A0 (and Chinese) text, which fo= llow a grid pattern. Especially the last line of a paragraph is squeezed, w= hich is "hurting the eye".

When characters need to= jump to the next line due to previously discussed line breaking rules, Con= TeXt seems to apply "Line adjustment by inter-character spacing expans= ion", which is a valid method according to https://www.w3.org/TR/jlreq= /#line_adjustment , although "Line adjustment by inter-character s= pacing reduction" is preferred.

The last point which Con= TeXt is missing, when talking about Japanese typesetting, is vertical writi= ng.

I know, this is a lot of work. Hopefully we can achieve = with joint efforts to make ConTeXt Japanese ready.

If I happ= en to have made false statements, please accept my apology. I tried to be o= f help as far as I could. I grew up in Japan and know more or less how type= set text should look like.

Emanuel


It would be nice if you can put your notes above into=C2= =A0
or in general improve/maintain that page=C2=A0
(e= .g. the links about the fonts are broken at the moment)
Perha= ps with=C2=A0Jeong Dal ?
Just to say, a few days ago I have s= een
(only webp images, so a bit tricky to= make a pdf)

--
luigi

--000000000000bda253061293fcc0-- --===============1087439906597374333== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl webpage : https://www.pragma-ade.nl / https://context.aanhet.net (mirror) archive : https://github.com/contextgarden/context wiki : https://wiki.contextgarden.net ___________________________________________________________________________________ --===============1087439906597374333==--