ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Straight Quotes / Curly Quotes
@ 2021-06-17 20:28 Thangalin
  2021-06-17 22:10 ` Henning Hraban Ramm
  0 siblings, 1 reply; 10+ messages in thread
From: Thangalin @ 2021-06-17 20:28 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 813 bytes --]

I've written a Java-based lexer/parser that can convert straight quotes to
curly quotes for English prose. It's a one-pass algorithm (O(n)) that uses
neither look-behind nor regex. Here's a list of test cases it handles:

https://raw.githubusercontent.com/DaveJarvis/keenquotes/main/lib/src/test/resources/com/keenwrite/quotes/smartypants.txt

A test harness converted several Project Gutenberg texts quite well. The
folks at PG may be interested in using it themselves to help convert quotes
in older texts en masse. The source code is MIT-licensed:

https://github.com/DaveJarvis/keenquotes/

The code should port to Lua fairly easily, should anyone be interested in
adding a straight/curly quotation mark conversion module to ConTeXt.
(Similar to the LaTeX package, but without using regex.)

Cheers everyone!

[-- Attachment #1.2: Type: text/html, Size: 1152 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Straight Quotes / Curly Quotes
  2021-06-17 20:28 Straight Quotes / Curly Quotes Thangalin
@ 2021-06-17 22:10 ` Henning Hraban Ramm
  2021-06-17 22:35   ` Hans Hagen
  2021-06-18  2:08   ` Thangalin
  0 siblings, 2 replies; 10+ messages in thread
From: Henning Hraban Ramm @ 2021-06-17 22:10 UTC (permalink / raw)
  To: mailing list for ConTeXt users

I usually convert all kinds of quotation marks into \quotation{} / \quote{} using the regex search of my editor; a regex replacement is also part of my docx-to-ConTeXt converter script. (I see no need to avoid regexes, but YMMV.)

The biggest problem I face are mixed and wrong quotation marks, e.g. English marks in a German text, a mixture of curly/straight marks, traditional LaTeX q. marks and similar mistakes. Some programs have a default of English single quotes with German double quotes :(

In what kind of workflows does your program make sense?
(Please don’t be offended, my view is limited.)

Hraban

> Am 17.06.2021 um 22:28 schrieb Thangalin <thangalin@gmail.com>:
> 
> I've written a Java-based lexer/parser that can convert straight quotes to curly quotes for English prose. It's a one-pass algorithm (O(n)) that uses neither look-behind nor regex. Here's a list of test cases it handles:
> 
> https://raw.githubusercontent.com/DaveJarvis/keenquotes/main/lib/src/test/resources/com/keenwrite/quotes/smartypants.txt
> 
> A test harness converted several Project Gutenberg texts quite well. The folks at PG may be interested in using it themselves to help convert quotes in older texts en masse. The source code is MIT-licensed:
> 
> https://github.com/DaveJarvis/keenquotes/
> 
> The code should port to Lua fairly easily, should anyone be interested in adding a straight/curly quotation mark conversion module to ConTeXt. (Similar to the LaTeX package, but without using regex.)
> 
> Cheers everyone!
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
> 
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
> archive  : https://bitbucket.org/phg/context-mirror/commits/
> wiki     : http://contextgarden.net
> ___________________________________________________________________________________

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Straight Quotes / Curly Quotes
  2021-06-17 22:10 ` Henning Hraban Ramm
@ 2021-06-17 22:35   ` Hans Hagen
  2021-06-18  2:08   ` Thangalin
  1 sibling, 0 replies; 10+ messages in thread
From: Hans Hagen @ 2021-06-17 22:35 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Henning Hraban Ramm

On 6/18/2021 12:10 AM, Henning Hraban Ramm wrote:
> I usually convert all kinds of quotation marks into \quotation{} / \quote{} using the regex search of my editor; a regex replacement is also part of my docx-to-ConTeXt converter script. (I see no need to avoid regexes, but YMMV.)
> 
> The biggest problem I face are mixed and wrong quotation marks, e.g. English marks in a German text, a mixture of curly/straight marks, traditional LaTeX q. marks and similar mistakes. Some programs have a default of English single quotes with German double quotes :(
> 
> In what kind of workflows does your program make sense?
> (Please don’t be offended, my view is limited.)
lua is normally fast enough to handle it wirh a few expresions or lpeg 
but in the end it depends on hwo far one will go

for instance, if it is for converting gutenberg files that extensive 
conversion can help ... with intermediate test runs (for instance 
coloring quitations quickly shows a runaway that then can be fixed in 
the input

For instance:

"Not all open quotes are closed...

kind of tricky because there one needs to know the source so there is no 
real universal solution (one could layer it)

in the past we had projects where we did the rendering and used tex but 
the rendering was trivial ... they came to us because we were able to 
turn crap into useful (it's unbelievable what can come from databases or 
generated from web applications, lack of symmetry, multiple escaping, 
bad encodings, inconsistencies) ... unfortunately the money is often 
already spent in getting to the stage where the crap is produced

but anyway after year sone kind of knows that there is always a solution 
(also because tex and related tools are so flexible and can help with 
diagnosing)

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Straight Quotes / Curly Quotes
  2021-06-17 22:10 ` Henning Hraban Ramm
  2021-06-17 22:35   ` Hans Hagen
@ 2021-06-18  2:08   ` Thangalin
  2021-06-18  6:09     ` Hans Hagen
  2021-06-18 10:00     ` Henning Hraban Ramm
  1 sibling, 2 replies; 10+ messages in thread
From: Thangalin @ 2021-06-18  2:08 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 1113 bytes --]

Hraban,

> In what kind of workflows does your program make sense?

Have you looked around the web lately?

KeenWrite (https://github.com/DaveJarvis/keenwrite), my plain text editor,
can neither convert nor easily type curly quotes into the application.
Recently, I added ConTeXt integration for exporting to PDF files. ConTeXt
doesn't curl the quotes, which I found a little surprising (because LaTeX
has a quote curling package). Not seeing an obvious solution, I coded my
own library because all the other libraries I found were either not up to
the task or required a massive natural language parser dependency.

My workflow will be: Edit plain text in KeenWrite, export to XHTML, curl
the quotes, run ConTeXt to typeset XHTML.

Another workflow: Edit plain text in KeenWrite, export to XHTML, curl the
quotes, upload to CMS.

The problem is that when typewriters were invented, curly quotes didn't
make it onto the popular layouts. Then, after Unicode, curly closing single
quotes and curly apostrophes were not made unique. HTML entities get it
right, though, with l/rdquo, l/rsquo, and apos. C'est la vie.

[-- Attachment #1.2: Type: text/html, Size: 1515 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Straight Quotes / Curly Quotes
  2021-06-18  2:08   ` Thangalin
@ 2021-06-18  6:09     ` Hans Hagen
  2021-06-18 15:48       ` Thangalin
  2021-06-18 10:00     ` Henning Hraban Ramm
  1 sibling, 1 reply; 10+ messages in thread
From: Hans Hagen @ 2021-06-18  6:09 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Thangalin

On 6/18/2021 4:08 AM, Thangalin wrote:
> Hraban,
> 
>  > In what kind of workflows does your program make sense?
> 
> Have you looked around the web lately?
> 
> KeenWrite (https://github.com/DaveJarvis/keenwrite 
> <https://github.com/DaveJarvis/keenwrite>), my plain text editor, can 
> neither convert nor easily type curly quotes into the application. 
> Recently, I added ConTeXt integration for exporting to PDF files. 
> ConTeXt doesn't curl the quotes, which I found a little surprising 
> (because LaTeX has a quote curling package). Not seeing an obvious 
> solution, I coded my own library because all the other libraries I found 
> were either not up to the task or required a massive natural language 
> parser dependency.
> 
> My workflow will be: Edit plain text in KeenWrite, export to XHTML, curl 
> the quotes, run ConTeXt to typeset XHTML.
> 
> Another workflow: Edit plain text in KeenWrite, export to XHTML, curl 
> the quotes, upload to CMS.
> 
> The problem is that when typewriters were invented, curly quotes didn't 
> make it onto the popular layouts. Then, after Unicode, curly closing 
> single quotes and curly apostrophes were not made unique. HTML entities 
> get it right, though, with l/rdquo, l/rsquo, and apos. C'est la vie.
what do you mean with 'latex curls quotes' .. can you give an example

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Straight Quotes / Curly Quotes
  2021-06-18  2:08   ` Thangalin
  2021-06-18  6:09     ` Hans Hagen
@ 2021-06-18 10:00     ` Henning Hraban Ramm
  2021-06-18 16:05       ` Thangalin
  1 sibling, 1 reply; 10+ messages in thread
From: Henning Hraban Ramm @ 2021-06-18 10:00 UTC (permalink / raw)
  To: mailing list for ConTeXt users


> Am 18.06.2021 um 04:08 schrieb Thangalin <thangalin@gmail.com>:
> 
> Hraban,
> 
> > In what kind of workflows does your program make sense?
> 
> Have you looked around the web lately?
> 
> KeenWrite (https://github.com/DaveJarvis/keenwrite), my plain text editor, can neither convert nor easily type curly quotes into the application. Recently, I added ConTeXt integration for exporting to PDF files. ConTeXt doesn't curl the quotes, which I found a little surprising (because LaTeX has a quote curling package). Not seeing an obvious solution, I coded my own library because all the other libraries I found were either not up to the task or required a massive natural language parser dependency.
> 
> My workflow will be: Edit plain text in KeenWrite, export to XHTML, curl the quotes, run ConTeXt to typeset XHTML.
> 
> Another workflow: Edit plain text in KeenWrite, export to XHTML, curl the quotes, upload to CMS.
> 
> The problem is that when typewriters were invented, curly quotes didn't make it onto the popular layouts. Then, after Unicode, curly closing single quotes and curly apostrophes were not made unique. HTML entities get it right, though, with l/rdquo, l/rsquo, and apos. C'est la vie.

I’m used to type special characters with key combinations, even use my own keyboard layout to access more accented characters via dead keys. (Nothing fancy like Neo, but just extensions to Apple’s German keyboard layout.) I always wanted to port that to my Linux machine, but even the default (German) keyboard layout for Linux lets me access curly quotes. And I didn’t find a handy keylayout editor like “Ukelele” for Linux. Anyway...

Using \quotation / \quote I avoid typing quotation marks in most cases.

There are exceptions – Hans mentioned missing or open-ended quotes, and sometimes the nesting of commands gets hairy (if quotations span paragraphs with additional markup), so that I manually type the quotation marks.

I regard it a bad idea to make straight quotation marks (inch marks) active to allow for “curling” them and would suggest the csquotes package with its \enquote command for LaTeX, even if it’s missing the setups for many languages.

In HTML you should be able to use <q> – I know that doesn’t work reliably in browsers (some add straight quotes to my CSS-configured guillemets).

Anyway, sorry for being negative on your project. It’s great if it helps you and others.

Hraban
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Straight Quotes / Curly Quotes
  2021-06-18  6:09     ` Hans Hagen
@ 2021-06-18 15:48       ` Thangalin
  2021-06-18 17:12         ` Hans Hagen
  0 siblings, 1 reply; 10+ messages in thread
From: Thangalin @ 2021-06-18 15:48 UTC (permalink / raw)
  To: Hans Hagen; +Cc: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 167 bytes --]

The csquotes package can curl straight quotes:

https://ctan.org/pkg/csquotes

I don't know how smart its smart quote feature is, though, with respect to
apostrophes.

[-- Attachment #1.2: Type: text/html, Size: 304 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Straight Quotes / Curly Quotes
  2021-06-18 10:00     ` Henning Hraban Ramm
@ 2021-06-18 16:05       ` Thangalin
  2021-06-18 17:03         ` Hans Hagen
  0 siblings, 1 reply; 10+ messages in thread
From: Thangalin @ 2021-06-18 16:05 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 829 bytes --]

> In HTML you should be able to use <q> – I know that doesn’t work reliably
in browsers (some add straight quotes to my CSS-configured guillemets).

The Converter class maps token replacements:

https://github.com/DaveJarvis/keenquotes/blob/d6c9761f8fe1ae96391f25dc73be52050a148e37/src/main/java/com/whitemagicsoftware/keenquotes/Converter.java#L15

It'd be trivial to use <q> and </q>, instead. For my purposes, HTML
entities work.

> Using \quotation / \quote I avoid typing quotation marks in most cases.

When writing plain text documents, adding TeX code or HTML code to
prescribe how the document should be presented is best avoided, so as to
keep the document decoupled from a particular tool chain. YMMV. A deeper
solution allows users to type the correctly curled quotes directly into the
document.

[-- Attachment #1.2: Type: text/html, Size: 1193 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Straight Quotes / Curly Quotes
  2021-06-18 16:05       ` Thangalin
@ 2021-06-18 17:03         ` Hans Hagen
  0 siblings, 0 replies; 10+ messages in thread
From: Hans Hagen @ 2021-06-18 17:03 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Thangalin

On 6/18/2021 6:05 PM, Thangalin wrote:
>  > In HTML you should be able to use <q> – I know that doesn’t work 
> reliably in browsers (some add straight quotes to my CSS-configured 
> guillemets).
> 
> The Converter class maps token replacements:
> 
> https://github.com/DaveJarvis/keenquotes/blob/d6c9761f8fe1ae96391f25dc73be52050a148e37/src/main/java/com/whitemagicsoftware/keenquotes/Converter.java#L15 
> <https://github.com/DaveJarvis/keenquotes/blob/d6c9761f8fe1ae96391f25dc73be52050a148e37/src/main/java/com/whitemagicsoftware/keenquotes/Converter.java#L15>
> 
> It'd be trivial to use <q> and </q>, instead. For my purposes, HTML 
> entities work.
> 
>  > Using \quotation / \quote I avoid typing quotation marks in most cases.
> 
> When writing plain text documents, adding TeX code or HTML code to 
> prescribe how the document should be presented is best avoided, so as to 
> keep the document decoupled from a particular tool chain. YMMV. A deeper 
> solution allows users to type the correctly curled quotes directly into 
> the document.
As with may things today this quote is rather english language centered 
.. tex operates in a multi lingual domain and quotes have always been 
dealt with using macros so that we can be sure we get the right ones 
(left/right) with the right spacing.

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Straight Quotes / Curly Quotes
  2021-06-18 15:48       ` Thangalin
@ 2021-06-18 17:12         ` Hans Hagen
  0 siblings, 0 replies; 10+ messages in thread
From: Hans Hagen @ 2021-06-18 17:12 UTC (permalink / raw)
  To: Thangalin; +Cc: mailing list for ConTeXt users

On 6/18/2021 5:48 PM, Thangalin wrote:
> The csquotes package can curl straight quotes:
> 
> https://ctan.org/pkg/csquotes <https://ctan.org/pkg/csquotes>
> 
> I don't know how smart its smart quote feature is, though, with respect 
> to apostrophes.

me neither and as we always had lots of quote related stuff on board i'm 
also not going to explore it ... when apostrophes get translated as you 
do but with active characters it's no fun (ok, we still have a few in 
context like ~ and |)

just for fun i made

    {\addff{primes} 123'345''\par}

use primes ... in a next upload

Hans



-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-06-18 17:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-17 20:28 Straight Quotes / Curly Quotes Thangalin
2021-06-17 22:10 ` Henning Hraban Ramm
2021-06-17 22:35   ` Hans Hagen
2021-06-18  2:08   ` Thangalin
2021-06-18  6:09     ` Hans Hagen
2021-06-18 15:48       ` Thangalin
2021-06-18 17:12         ` Hans Hagen
2021-06-18 10:00     ` Henning Hraban Ramm
2021-06-18 16:05       ` Thangalin
2021-06-18 17:03         ` Hans Hagen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).