caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] File synchronization implementation(s) in OCaml?
@ 2018-02-08  3:05 Evgeny Roubinchtein
  2018-02-08  7:24 ` Malcolm Matalka
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Evgeny Roubinchtein @ 2018-02-08  3:05 UTC (permalink / raw)
  To: OCaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 782 bytes --]

Dear OCaml users and developers,

Do you have advice on:

1. Practical file synchronization algorithms.  Rsync is the low bar for my
purposes here, i.e., I don't want anything that performs worse than rsync
in practice, but I am wondering if there is a way to do better.  My
completely uninformed attempt at searching the literature turned up this
paper: http://engineering.nyu.edu/~suel/papers/recon.pdf, but I don't know
anything about the area, so I am afraid that I don't even know what I don't
know about the subject :-).  I am also aware that Unison has an
implementation of an rsync-like algorithm, but I don't know much more than
that about that implementation.

2. Existing implementation(s) of said algorithms in OCaml.

Thank you in advance!

-- 
Best,
Evgeny ("Zhenya")

[-- Attachment #2: Type: text/html, Size: 1009 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] File synchronization implementation(s) in OCaml?
  2018-02-08  3:05 [Caml-list] File synchronization implementation(s) in OCaml? Evgeny Roubinchtein
@ 2018-02-08  7:24 ` Malcolm Matalka
  2018-02-08  8:01 ` Francois BERENGER
  2018-02-08 13:11 ` Cedric Cellier
  2 siblings, 0 replies; 7+ messages in thread
From: Malcolm Matalka @ 2018-02-08  7:24 UTC (permalink / raw)
  To: Evgeny Roubinchtein; +Cc: OCaml Mailing List

Check out Unison

https://www.cis.upenn.edu/~bcpierce/unison/

Evgeny Roubinchtein <zhenya1007@gmail.com> writes:

> Dear OCaml users and developers,
>
> Do you have advice on:
>
> 1. Practical file synchronization algorithms.  Rsync is the low bar for my
> purposes here, i.e., I don't want anything that performs worse than rsync
> in practice, but I am wondering if there is a way to do better.  My
> completely uninformed attempt at searching the literature turned up this
> paper: http://engineering.nyu.edu/~suel/papers/recon.pdf, but I don't know
> anything about the area, so I am afraid that I don't even know what I don't
> know about the subject :-).  I am also aware that Unison has an
> implementation of an rsync-like algorithm, but I don't know much more than
> that about that implementation.
>
> 2. Existing implementation(s) of said algorithms in OCaml.
>
> Thank you in advance!
>
> -- 
> Best,
> Evgeny ("Zhenya")


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] File synchronization implementation(s) in OCaml?
  2018-02-08  3:05 [Caml-list] File synchronization implementation(s) in OCaml? Evgeny Roubinchtein
  2018-02-08  7:24 ` Malcolm Matalka
@ 2018-02-08  8:01 ` Francois BERENGER
  2018-02-08 13:11 ` Cedric Cellier
  2 siblings, 0 replies; 7+ messages in thread
From: Francois BERENGER @ 2018-02-08  8:01 UTC (permalink / raw)
  To: caml-list

On 02/08/2018 12:05 PM, Evgeny Roubinchtein wrote:
> Dear OCaml users and developers,
> 
> Do you have advice on:
> 
> 1. Practical file synchronization algorithms.  Rsync is the low bar for
> my purposes here, i.e., I don't want anything that performs worse than
> rsync in practice, but I am wondering if there is a way to do better. 
> My completely uninformed attempt at searching the literature turned up
> this paper: http://engineering.nyu.edu/~suel/papers/recon.pdf, but I
> don't know anything about the area, so I am afraid that I don't even
> know what I don't know about the subject :-).  I am also aware that
> Unison has an implementation of an rsync-like algorithm, but I don't
> know much more than that about that implementation.

The algorithm behind the tarsnap service looks cool.
I think it works even on binary files.

https://www.tarsnap.com/

I think the exact algorithm is given in Colin Percival's thesis:
https://ora.ox.ac.uk/objects/uuid:4f0d53cc-fb9f-4246-a835-3c8734eba735/datastreams/THESIS01

> 2. Existing implementation(s) of said algorithms in OCaml.
> 
> Thank you in advance!
> 
> -- 
> Best,
> Evgeny ("Zhenya")

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] File synchronization implementation(s) in OCaml?
  2018-02-08  3:05 [Caml-list] File synchronization implementation(s) in OCaml? Evgeny Roubinchtein
  2018-02-08  7:24 ` Malcolm Matalka
  2018-02-08  8:01 ` Francois BERENGER
@ 2018-02-08 13:11 ` Cedric Cellier
  2018-02-08 14:44   ` Yaron Minsky
  2 siblings, 1 reply; 7+ messages in thread
From: Cedric Cellier @ 2018-02-08 13:11 UTC (permalink / raw)
  To: OCaml Mailing List

-[ Wed, Feb 07, 2018 at 07:05:15PM -0800, Evgeny Roubinchtein ]----
> I don't want anything that performs worse than rsync in practice,

It would be interesting to know how you measure performance as one could
think of many metrics:

- speed on different data
- speed on similar data
- reliability in face of simultaneous synchronizations
- reliability in case of bad network
- usage of resources
- confidentiality
- ...?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] File synchronization implementation(s) in OCaml?
  2018-02-08 13:11 ` Cedric Cellier
@ 2018-02-08 14:44   ` Yaron Minsky
  2018-02-08 15:42     ` Evgeny Roubinchtein
  0 siblings, 1 reply; 7+ messages in thread
From: Yaron Minsky @ 2018-02-08 14:44 UTC (permalink / raw)
  To: Cedric Cellier; +Cc: OCaml Mailing List

Ha! The paper you linked builds off of an old paper I wrote:

http://cis.poly.edu/westlab/papers/ref/practical.pdf

There is in fact a full implementation of the algorithms in this paper
in OCaml, as part of the SKS system that I wrote many years ago.

https://bitbucket.org/skskeyserver/sks-keyserver/wiki/Home

I'm not especially proud of the code, but it does work...

y

On Thu, Feb 8, 2018 at 8:11 AM, Cedric Cellier <rixed@happyleptic.org> wrote:
> -[ Wed, Feb 07, 2018 at 07:05:15PM -0800, Evgeny Roubinchtein ]----
>> I don't want anything that performs worse than rsync in practice,
>
> It would be interesting to know how you measure performance as one could
> think of many metrics:
>
> - speed on different data
> - speed on similar data
> - reliability in face of simultaneous synchronizations
> - reliability in case of bad network
> - usage of resources
> - confidentiality
> - ...?
>
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] File synchronization implementation(s) in OCaml?
  2018-02-08 14:44   ` Yaron Minsky
@ 2018-02-08 15:42     ` Evgeny Roubinchtein
  2018-02-08 15:50       ` Hendrik Boom
  0 siblings, 1 reply; 7+ messages in thread
From: Evgeny Roubinchtein @ 2018-02-08 15:42 UTC (permalink / raw)
  To: OCaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 3651 bytes --]

Thank you everyone for all the responses and pointers to artefacts and
papers.

Cedric, that is an excellent point; thank you.  My current intended use is
having the user edit file(s) on host A using a text editor, and, when the
user saves a file, having those edits reflected as quickly as possible in
the corresponding file on host B.  So, my primary measure of performance is
the speed of update: I want the time between the events "the user, who runs
the editor on host A, has issued a command to the editor to save the file"
and "the contents of the file the user has just saved on host A and the
corresponding file on host B is identical" to be as low as feasible (there
is a mapping between file paths on hosts A and B).  Sometimes, the user
will create a new file on host A; then the file's contents needs to be put
as quickly as possible into the corresponding location on host B.  One
other consideration is that occasionally   the user may have their text
editor save a number of files in quick (at least in human scale)
succession: for example, the user issues "M-x compile" in Emacs, and Emacs
offers to save all modified buffers before running the compilation: in that
case, the total time to propagate the changes to all modified file(s)
should be as low as possible.  So, to summarize:

1. One-way synchronization is acceptable (I may find out otherwise with
experience, but, for now, I am willing to make that assumption)
2. It is always known which file(s) have been modified.
3. It is probably feasible to plumb through the information about what
part(s) of of a file were modified for each file from the text editor to
the updater, if that helps with the speed of the update.  The editor
usually "knows" what has changed, because it needs to be able to undo the
changes.
4. The relevant metric is the speed of the update.
5. Sometimes changes to more than one file may be saved in quick (human
scale) succession.  The relevant metric is still the speed of update to all
files.

I apologize for the somewhat long-winded description of the metric I care
about.  I do hope that clarifies it.

-- 
Best,
Evgeny ("Zhenya")

On Thu, Feb 8, 2018 at 6:44 AM, Yaron Minsky <yminsky@janestreet.com> wrote:

> Ha! The paper you linked builds off of an old paper I wrote:
>
> http://cis.poly.edu/westlab/papers/ref/practical.pdf
>
> There is in fact a full implementation of the algorithms in this paper
> in OCaml, as part of the SKS system that I wrote many years ago.
>
> https://bitbucket.org/skskeyserver/sks-keyserver/wiki/Home
>
> I'm not especially proud of the code, but it does work...
>
> y
>
> On Thu, Feb 8, 2018 at 8:11 AM, Cedric Cellier <rixed@happyleptic.org>
> wrote:
> > -[ Wed, Feb 07, 2018 at 07:05:15PM -0800, Evgeny Roubinchtein ]----
> >> I don't want anything that performs worse than rsync in practice,
> >
> > It would be interesting to know how you measure performance as one could
> > think of many metrics:
> >
> > - speed on different data
> > - speed on similar data
> > - reliability in face of simultaneous synchronizations
> > - reliability in case of bad network
> > - usage of resources
> > - confidentiality
> > - ...?
> >
> >
> > --
> > Caml-list mailing list.  Subscription management and archives:
> > https://sympa.inria.fr/sympa/arc/caml-list
> > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> > Bug reports: http://caml.inria.fr/bin/caml-bugs
> >
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 5303 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] File synchronization implementation(s) in OCaml?
  2018-02-08 15:42     ` Evgeny Roubinchtein
@ 2018-02-08 15:50       ` Hendrik Boom
  0 siblings, 0 replies; 7+ messages in thread
From: Hendrik Boom @ 2018-02-08 15:50 UTC (permalink / raw)
  To: caml-list

On Thu, Feb 08, 2018 at 07:42:57AM -0800, Evgeny Roubinchtein wrote:
> Thank you everyone for all the responses and pointers to artefacts and
> papers.
> 
> Cedric, that is an excellent point; thank you.  My current intended use is
> having the user edit file(s) on host A using a text editor, and, when the
> user saves a file, having those edits reflected as quickly as possible in
> the corresponding file on host B.  So, my primary measure of performance is
> the speed of update: I want the time between the events "the user, who runs
> the editor on host A, has issued a command to the editor to save the file"
> and "the contents of the file the user has just saved on host A and the
> corresponding file on host B is identical" to be as low as feasible (there
> is a mapping between file paths on hosts A and B).  Sometimes, the user
> will create a new file on host A; then the file's contents needs to be put
> as quickly as possible into the corresponding location on host B.  One
> other consideration is that occasionally   the user may have their text
> editor save a number of files in quick (at least in human scale)
> succession: for example, the user issues "M-x compile" in Emacs, and Emacs
> offers to save all modified buffers before running the compilation: in that
> case, the total time to propagate the changes to all modified file(s)
> should be as low as possible.  So, to summarize:
> 
> 1. One-way synchronization is acceptable (I may find out otherwise with
> experience, but, for now, I am willing to make that assumption)
> 2. It is always known which file(s) have been modified.
> 3. It is probably feasible to plumb through the information about what
> part(s) of of a file were modified for each file from the text editor to
> the updater, if that helps with the speed of the update.  The editor
> usually "knows" what has changed, because it needs to be able to undo the
> changes.
> 4. The relevant metric is the speed of the update.
> 5. Sometimes changes to more than one file may be saved in quick (human
> scale) succession.  The relevant metric is still the speed of update to all
> files.

Unless your requirements are very different from mine, I suspct you want 
distributed revision management.  The system I use for that is monotone.  
Once set up it's a lot easier to use than git.


-- hendrik

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-02-08 15:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-08  3:05 [Caml-list] File synchronization implementation(s) in OCaml? Evgeny Roubinchtein
2018-02-08  7:24 ` Malcolm Matalka
2018-02-08  8:01 ` Francois BERENGER
2018-02-08 13:11 ` Cedric Cellier
2018-02-08 14:44   ` Yaron Minsky
2018-02-08 15:42     ` Evgeny Roubinchtein
2018-02-08 15:50       ` Hendrik Boom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).