public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* More on changing the case of the first character in Lua
@ 2023-10-05 14:01 BPJ
  0 siblings, 0 replies; only message in thread
From: BPJ @ 2023-10-05 14:01 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 3152 bytes --]

How to add functions for setting the case of the first char in a string in
Lua. This takes advantage of two features of `string.gsub`:

- If the third argument, the replacement, is a function it will be passed
each match, or the capture(s) of the match if any, as argument(s) and the
replacement is whatever it returns (although it must be a string or number!)
- If passed a (non-negative) integer as fourth argument at most that many
substitutions will be made, so by passing `1` as argument \#4 we can
substitute just the first match.

``` lua
local charpat = utf8.charpattern
for _, case in ipairs({ 'upper', 'lowet' }) do
  local case_fun = pandoc.text[case]
  local name = case .. '_first'
  pandoc.text[name] = function(s)
    local stype = type(s)
    if 'string' ~= stype then
      error("Argument must be string, not " .. stype)
    end
    -- Set case of the first match against charpat in s.
    return s:gsub(charpat, case_fun, 1)
  end
end
```

If we could use lua-utf8 <https://github.com/starwing/luautf8> we could
match the first letter instead since in its gsub `%l` == Unicode General
Category Letter! We could even say

``` lua
lutf8.gsub(s, '%f[_%w]%l', lutf8.upper, 1)
```

to uppercase the first letter in the first word!

Theoretically you could use a Unicode-aware regex library. The
lrexlib-Oniguruma binding

<https://github.com/rrthomas/lrexlib> <
http://rrthomas.github.io/lrexlib/manual.html> <
http://rrthomas.github.io/lrexlib/manual.html#oniguruma-only-functions-and-methods
>

would probably be a good choice, because it has a very full-featured (not
lua-like!) regex syntax and some very useful other features although
lrexlib doesn’t support all of them, but on the other hand adds some of its
own to all its bindings, notably Lua-like `match`, `find`, `gmatch`, `gsub`
functions, a `split` function and a `tfind` function which returns a table
with captures, including named captures if any/supported by the
library.[^1] However luautf8 and the lrexlib libraries can’t be used with
the statically linked Pandoc binaries and I don’t expect that any of them
can be included with Pandoc as they are rather big. Let’s see how
`lpeg.utfR` fares with hugeish alternations like “all LGC
(uppercase/lowercase) letters” which break memory with the `lpeg.P(char) +
lpeg.P(char) + ...` approach. Unfortunately the fact that most extension
blocks list letters in upper—lower pairs probably speaks against it.

[^1]: FWIW I wouldn’t mind a ~~`tgsub`~~ function which would take a
function as argument \#3 and pass it a table in the style of `tfind`,
although you can fake it by looping over a string with `tfind` in Lua.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhCo2iqrG0XemNaN2ax28Q04%2By2Td4OLe0c0cqbVFJ3zoQ%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 4571 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-10-05 14:01 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-05 14:01 More on changing the case of the first character in Lua BPJ

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).