ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* new hash for buffer (as file)
@ 2022-09-22 17:25 Pablo Rodriguez via ntg-context
  2022-09-23  4:01 ` Max Chernoff via ntg-context
  0 siblings, 1 reply; 9+ messages in thread
From: Pablo Rodriguez via ntg-context @ 2022-09-22 17:25 UTC (permalink / raw)
  To: ConTeXt users; +Cc: Pablo Rodriguez

Dear list,

playing with buffer contents, I have the following file:

  \setupinteraction[state=start]
  \setupinteractionscreen[option={attachment}]

  \startbuffer[test]
  just a test
  and another one
  \stopbuffer

  \starttext
  \ctxlua{require("util-sha")}

  \def\shabuffer#1%
    {\cldcontext{utilities.sha2.hash256(buffers.raw("#1"))}}

  \def\shafile#1%
    {\cldcontext{utilities.sha2.hash256(io.loaddata("#1"))}}

  \def\shabufferfile#1%
    {\cldcontext{utilities.sha2.hash256(buffers.raw("#1"))}}

  \shabuffer{test}

  \savebuffer[test][temporary-αβγ, prefix=no]

  \shafile{temporary-αβγ}

  \attachment[buffer=test, name=\shabufferfile{test}, method=hidden]
  \stoptext

I mean, to get hash of the file attached to the document, I need to save
the buffer for "context(utilities.sha2.hash256(io.loaddata(buffer)))".

But I don’t need to save the buffer to attach it to the PDF document.

My question is how to define \shabufferfile to avoid \savebuffer (only
required to get the hash).

An approach would be the following one. If I’m not totally wrong,
"savebuffer"
(https://github.com/contextgarden/context/blob/main/tex/context/base/mkxl/buff-ini.lmt#L559)
may be just replacing new lines with "\n" in the original buffer
(https://github.com/contextgarden/context/blob/main/tex/context/base/mkxl/buff-ini.lmt#L576).

The function string.replacenewlines() is defined at
https://github.com/contextgarden/context/blob/main/tex/context/base/mkiv/util-str.lua#L1475.

If I’m not totally wrong about savebuffer replacing newlines with "\n",
I wonder how to create a temporary buffer with such a replacement, so
that it could be hashed later.

I hope my question is clear.

Many thanks in advance for your help,

Pablo
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: new hash for buffer (as file)
  2022-09-22 17:25 new hash for buffer (as file) Pablo Rodriguez via ntg-context
@ 2022-09-23  4:01 ` Max Chernoff via ntg-context
  2022-09-23 15:06   ` Pablo Rodriguez via ntg-context
  0 siblings, 1 reply; 9+ messages in thread
From: Max Chernoff via ntg-context @ 2022-09-23  4:01 UTC (permalink / raw)
  To: ntg-context; +Cc: Max Chernoff, oinos

Hi Pablo,

> I mean, to get hash of the file attached to the document, I need to save
> the buffer for "context(utilities.sha2.hash256(io.loaddata(buffer)))".
> 
> But I don’t need to save the buffer to attach it to the PDF document.
> 
> My question is how to define \shabufferfile to avoid \savebuffer (only
> required to get the hash).

The SHA calculation isn't working properly because of a weird newline
issue. Try this:

   \setupinteraction[state=start]
   \setupinteractionscreen[option={attachment}]
   
   \startbuffer[test]
   just a test
   and another one
   \stopbuffer
   
   \starttext
   \startluacode
       require("util-sha")
   
       function sha256(str)
           return utilities.sha2.hash256(
               str:gsub(string.char(0x0D), string.char(0x0A))
           )
       end
   \stopluacode
   
   \def\shabuffer#1%
   {\cldcontext{sha256(buffers.raw("#1"))}}
   
   \def\shafile#1%
   {\cldcontext{sha256(io.loaddata("#1"))}}
   
   \shabuffer{test}
   
   \savebuffer[test][temporary-αβγ, prefix=no]
   
   \shafile{temporary-αβγ}
   
   \attachment[buffer=test, name=\shabuffer{test}, method=hidden]
   \stoptext
   
You can remove the "\savebuffer" and the "\shafile"; I just kept that in to
show that the two hashes are now the same.

-- Max
> 
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: new hash for buffer (as file)
  2022-09-23  4:01 ` Max Chernoff via ntg-context
@ 2022-09-23 15:06   ` Pablo Rodriguez via ntg-context
  2022-09-25 17:59     ` Pablo Rodriguez via ntg-context
  0 siblings, 1 reply; 9+ messages in thread
From: Pablo Rodriguez via ntg-context @ 2022-09-23 15:06 UTC (permalink / raw)
  To: Max Chernoff via ntg-context; +Cc: Pablo Rodriguez

On 9/23/22 06:01, Max Chernoff via ntg-context wrote:
> […]
> The SHA calculation isn't working properly because of a weird newline
> issue. Try this:
> […]
>        function sha256(str)
>            return utilities.sha2.hash256(
>                str:gsub(string.char(0x0D), string.char(0x0A))
>            )
>        end
> […]

Hi Max,

this works perfectly fine with Linux "str:gsub('\r','\n')", but I can’t
make it work in Windows.

I always thought that Unix used LF (\n, if I’m not wrong) to mark a new
line, and Windows used CRLF (\r\n).

How are new lines marked in the buffer? As \r instead of \r\n or \n?

At least, Notepad (the minimal plain text editor in Windows) doesn’t
recognize newlines if I attach the buffer to the PDF document as a .txt
file.

Many thanks for your help,

Pablo
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: new hash for buffer (as file)
  2022-09-23 15:06   ` Pablo Rodriguez via ntg-context
@ 2022-09-25 17:59     ` Pablo Rodriguez via ntg-context
  2022-09-26  0:05       ` Max Chernoff via ntg-context
  0 siblings, 1 reply; 9+ messages in thread
From: Pablo Rodriguez via ntg-context @ 2022-09-25 17:59 UTC (permalink / raw)
  To: Pablo Rodriguez via ntg-context; +Cc: Pablo Rodriguez

On 9/23/22 17:06, Pablo Rodriguez via ntg-context wrote:
> On 9/23/22 06:01, Max Chernoff via ntg-context wrote:
>> […]
>>            return utilities.sha2.hash256(
>>                str:gsub(string.char(0x0D), string.char(0x0A))
>>            )
> […]
> this works perfectly fine with Linux "str:gsub('\r','\n')", but I can’t
> make it work in Windows.

Hi again Max,

this seems to solve the issue in Windows too:

  \startbuffer[test]
  just a test
  and another one
  \stopbuffer

  \starttext
  \startluacode
  require("util-sha")

  function sha256(str)
    if os.name == "windows" then
      return utilities.sha2.hash256(str:gsub("\r", "\r\n"))
    else
      return utilities.sha2.hash256(str:gsub("\r", "\n"))
    end
  end
  \stopluacode

  \def\shabuffer#1%
    {\cldcontext{sha256(buffers.raw("#1"))}}

  \def\shafile#1%
    {\cldcontext{utilities.sha2.hash256(io.loaddata("#1"))}}

  \shabuffer{test}

  \savebuffer[test][temporary-αβγ, prefix=no]

  \shafile{temporary-αβγ}

  \stoptext

But now I don’t understand is the following issue: if the saved file
contains "\r\n", why does basic Notepad the new lines?

"\r\n" are the chars to get new lines in Windows. Or what am I missing here?

Many thanks for your help,

Pablo
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: new hash for buffer (as file)
  2022-09-25 17:59     ` Pablo Rodriguez via ntg-context
@ 2022-09-26  0:05       ` Max Chernoff via ntg-context
  2022-09-26  8:47         ` Hans Hagen via ntg-context
  2022-09-26 17:24         ` Pablo Rodriguez via ntg-context
  0 siblings, 2 replies; 9+ messages in thread
From: Max Chernoff via ntg-context @ 2022-09-26  0:05 UTC (permalink / raw)
  To: ntg-context; +Cc: Max Chernoff, oinos


Hi Pablo,

> But now I don’t understand is the following issue: if the saved file
> contains "\r\n", why does basic Notepad the new lines?
> 
> "\r\n" are the chars to get new lines in Windows. Or what am I missing here?

I'm not too sure what you're asking here, but Notepad was somewhat-
recently updated to handle both CRLF and LF line endings:

   https://devblogs.microsoft.com/commandline/extended-eol-in-notepad/
   
But I do agree that the line ending handling seems a little odd. I find it
surprising that the buffers internally use CR line endings since no systems
in the past 20 years use that. 

Also, you should probably check to make sure that the results of the
file don't depend on the current code page on Windows. Try writing out a
buffer from ConTeXt with the following contents:

   АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя
   
First, run "chcp 65001" before running "context" and record the size of the
file written. Then, run "chcp 1251" and run "context" again. Hopefully the
file size doesn't change; but if it does, then that means that the binary
content of any file written will depend on the system's default code page,
which would complicate making reproducible hashes.
   
-- Max
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: new hash for buffer (as file)
  2022-09-26  0:05       ` Max Chernoff via ntg-context
@ 2022-09-26  8:47         ` Hans Hagen via ntg-context
  2022-09-26 23:01           ` Max Chernoff via ntg-context
  2022-09-26 17:24         ` Pablo Rodriguez via ntg-context
  1 sibling, 1 reply; 9+ messages in thread
From: Hans Hagen via ntg-context @ 2022-09-26  8:47 UTC (permalink / raw)
  To: Max Chernoff via ntg-context; +Cc: Hans Hagen

On 9/26/2022 2:05 AM, Max Chernoff via ntg-context wrote:
> 
> Hi Pablo,
> 
>> But now I don’t understand is the following issue: if the saved file
>> contains "\r\n", why does basic Notepad the new lines?
>>
>> "\r\n" are the chars to get new lines in Windows. Or what am I missing here?
> 
> I'm not too sure what you're asking here, but Notepad was somewhat-
> recently updated to handle both CRLF and LF line endings:
> 
>     https://devblogs.microsoft.com/commandline/extended-eol-in-notepad/
>     
> But I do agree that the line ending handling seems a little odd. I find it
> surprising that the buffers internally use CR line endings since no systems
> in the past 20 years use that.

how about tex ...

\number\endlinechar
\number\numexpr`M-`A+1\relax % plain sets up `^^M

... you don't want to know how much hassle dealing with line endings in 
tex is

> Also, you should probably check to make sure that the results of the
> file don't depend on the current code page on Windows. Try writing out a
> buffer from ConTeXt with the following contents:
> 
>     АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя
>     
> First, run "chcp 65001" before running "context" and record the size of the
> file written. Then, run "chcp 1251" and run "context" again. Hopefully the
> file size doesn't change; but if it does, then that means that the binary
> content of any file written will depend on the system's default code page,
> which would complicate making reproducible hashes.
if that were the case nothing would work .. so it's bytes in - bytes out

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: new hash for buffer (as file)
  2022-09-26  0:05       ` Max Chernoff via ntg-context
  2022-09-26  8:47         ` Hans Hagen via ntg-context
@ 2022-09-26 17:24         ` Pablo Rodriguez via ntg-context
  2022-09-26 18:07           ` Hans Hagen via ntg-context
  1 sibling, 1 reply; 9+ messages in thread
From: Pablo Rodriguez via ntg-context @ 2022-09-26 17:24 UTC (permalink / raw)
  To: Max Chernoff via ntg-context; +Cc: Pablo Rodriguez

On 9/26/22 02:05, Max Chernoff via ntg-context wrote:
>
> Hi Pablo,
>
>> But now I don’t understand is the following issue: if the saved file
>> contains "\r\n", why does basic Notepad the new lines?
>>
>> "\r\n" are the chars to get new lines in Windows. Or what am I missing here?
>
> I'm not too sure what you're asking here, but Notepad was somewhat-
> recently updated to handle both CRLF and LF line endings:
>
>    https://devblogs.microsoft.com/commandline/extended-eol-in-notepad/

Hi Max,

I realized later that I was doing something wrong. My fault here.

> [...]
> Also, you should probably check to make sure that the results of the
> file don't depend on the current code page on Windows. Try writing out a
> buffer from ConTeXt with the following contents:
>
>    АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя
>
> First, run "chcp 65001" before running "context" and record the size of the
> file written. Then, run "chcp 1251" and run "context" again. Hopefully the
> file size doesn't change; but if it does, then that means that the binary
> content of any file written will depend on the system's default code page,
> which would complicate making reproducible hashes.

For more than two decades, all my TeX sources are written in UTF-8.

I thought that ConTeXt would output the same character encoding as in
the source file when saving a buffer.

I haven’t found this issue and I’d say that all my saved buffers are
UTF-8 encoded.

Many thanks for your help,

Pablo
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: new hash for buffer (as file)
  2022-09-26 17:24         ` Pablo Rodriguez via ntg-context
@ 2022-09-26 18:07           ` Hans Hagen via ntg-context
  0 siblings, 0 replies; 9+ messages in thread
From: Hans Hagen via ntg-context @ 2022-09-26 18:07 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Hans Hagen, Pablo Rodriguez

On 9/26/2022 7:24 PM, Pablo Rodriguez via ntg-context wrote:
> On 9/26/22 02:05, Max Chernoff via ntg-context wrote:
>>
>> Hi Pablo,
>>
>>> But now I don’t understand is the following issue: if the saved file
>>> contains "\r\n", why does basic Notepad the new lines?
>>>
>>> "\r\n" are the chars to get new lines in Windows. Or what am I missing here?
>>
>> I'm not too sure what you're asking here, but Notepad was somewhat-
>> recently updated to handle both CRLF and LF line endings:
>>
>>     https://devblogs.microsoft.com/commandline/extended-eol-in-notepad/
> 
> Hi Max,
> 
> I realized later that I was doing something wrong. My fault here.
> 
>> [...]
>> Also, you should probably check to make sure that the results of the
>> file don't depend on the current code page on Windows. Try writing out a
>> buffer from ConTeXt with the following contents:
>>
>>     АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя
>>
>> First, run "chcp 65001" before running "context" and record the size of the
>> file written. Then, run "chcp 1251" and run "context" again. Hopefully the
>> file size doesn't change; but if it does, then that means that the binary
>> content of any file written will depend on the system's default code page,
>> which would complicate making reproducible hashes.
> 
> For more than two decades, all my TeX sources are written in UTF-8.
> 
> I thought that ConTeXt would output the same character encoding as in
> the source file when saving a buffer.
> 
> I haven’t found this issue and I’d say that all my saved buffers are
> UTF-8 encoded.
the magic is in

savedata(name,replacenewlines(content),"\n",option == v_append)

because tex reads in and then lost what it saw (cr lf crlf) we use the 
line endings of the operating system (good old typewriters and windows 
use cr+lf and old macs uses cr while linux uses lf)

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: new hash for buffer (as file)
  2022-09-26  8:47         ` Hans Hagen via ntg-context
@ 2022-09-26 23:01           ` Max Chernoff via ntg-context
  0 siblings, 0 replies; 9+ messages in thread
From: Max Chernoff via ntg-context @ 2022-09-26 23:01 UTC (permalink / raw)
  To: ntg-context; +Cc: Max Chernoff

Hi Hans, Pablo,


> > But I do agree that the line ending handling seems a little odd. I find it
> > surprising that the buffers internally use CR line endings since no systems
> > in the past 20 years use that.
> 
> how about tex ...
> 
> \number\endlinechar
> \number\numexpr`M-`A+1\relax % plain sets up `^^M

Argh, how could I have forgotten about that. Yes, that makes complete
sense.

> > First, run "chcp 65001" before running "context" and record the size of the
> > file written. Then, run "chcp 1251" and run "context" again. Hopefully the
> > file size doesn't change; but if it does, then that means that the binary
> > content of any file written will depend on the system's default code page,
> > which would complicate making reproducible hashes.
>
> if that were the case nothing would work .. so it's bytes in - bytes out

Ok good, that's what I was expecting. I've unfortunately used some
programs that even fairly recently depended on the system code page, so
I'm always a little cautious.

> Hi Max,
> 
> I realized later that I was doing something wrong. My fault here.

Glad that you've figured it out.

> I thought that ConTeXt would output the same character encoding as in
> the source file when saving a buffer.

Yes, Hans confirmed that that is correct. 

Thanks,
-- Max


___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / https://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : https://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : https://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-09-26 23:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-22 17:25 new hash for buffer (as file) Pablo Rodriguez via ntg-context
2022-09-23  4:01 ` Max Chernoff via ntg-context
2022-09-23 15:06   ` Pablo Rodriguez via ntg-context
2022-09-25 17:59     ` Pablo Rodriguez via ntg-context
2022-09-26  0:05       ` Max Chernoff via ntg-context
2022-09-26  8:47         ` Hans Hagen via ntg-context
2022-09-26 23:01           ` Max Chernoff via ntg-context
2022-09-26 17:24         ` Pablo Rodriguez via ntg-context
2022-09-26 18:07           ` Hans Hagen via ntg-context

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).