How to add functions for setting the case of the first char in a string in Lua. This takes advantage of two features of `string.gsub`:

- If the third argument, the replacement, is a function it will be passed each match, or the capture(s) of the match if any, as argument(s) and the replacement is whatever it returns (although it must be a string or number!)
- If passed a (non-negative) integer as fourth argument at most that many substitutions will be made, so by passing `1` as argument \#4 we can substitute just the first match.

``` lua
local charpat = utf8.charpattern
for _, case in ipairs({ 'upper', 'lowet' }) do
  local case_fun = pandoc.text[case]
  local name = case .. '_first'
  pandoc.text[name] = function(s)
    local stype = type(s)
    if 'string' ~= stype then
      error("Argument must be string, not " .. stype)
    end
    -- Set case of the first match against charpat in s.
    return s:gsub(charpat, case_fun, 1)
  end
end
```

If we could use lua-utf8 <https://github.com/starwing/luautf8> we could match the first letter instead since in its gsub `%l` == Unicode General Category Letter! We could even say

``` lua
lutf8.gsub(s, '%f[_%w]%l', lutf8.upper, 1)
```

to uppercase the first letter in the first word!

Theoretically you could use a Unicode-aware regex library. The lrexlib-Oniguruma binding

<https://github.com/rrthomas/lrexlib> <http://rrthomas.github.io/lrexlib/manual.html> <http://rrthomas.github.io/lrexlib/manual.html#oniguruma-only-functions-and-methods>

would probably be a good choice, because it has a very full-featured (not lua-like!) regex syntax and some very useful other features although lrexlib doesn’t support all of them, but on the other hand adds some of its own to all its bindings, notably Lua-like `match`, `find`, `gmatch`, `gsub` functions, a `split` function and a `tfind` function which returns a table with captures, including named captures if any/supported by the library.[^1] However luautf8 and the lrexlib libraries can’t be used with the statically linked Pandoc binaries and I don’t expect that any of them can be included with Pandoc as they are rather big. Let’s see how `lpeg.utfR` fares with hugeish alternations like “all LGC (uppercase/lowercase) letters” which break memory with the `lpeg.P(char) + lpeg.P(char) + ...` approach. Unfortunately the fact that most extension blocks list letters in upper—lower pairs probably speaks against it.

[^1]: FWIW I wouldn’t mind a ~~`tgsub`~~ function which would take a function as argument \#3 and pass it a table in the style of `tfind`, although you can fake it by looping over a string with `tfind` in Lua.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhCo2iqrG0XemNaN2ax28Q04%2By2Td4OLe0c0cqbVFJ3zoQ%40mail.gmail.com.