Gnus development mailing list
 help / color / mirror / Atom feed
* Repost: Time to revisit the message id generation algorithm?
@ 2012-02-01 18:18 Johann 'Myrkraverk' Oskarsson
  2012-02-01 23:29 ` Russ Allbery
  0 siblings, 1 reply; 3+ messages in thread
From: Johann 'Myrkraverk' Oskarsson @ 2012-02-01 18:18 UTC (permalink / raw)
  To: Gnus Ding

Hi all,

Reposting this here from gnu.emacs.gnus:


There are two issues I have with the message id generation algorithm;
the (message-unique-id) function.

1) It bleeds information.  This is an issue for those who use TOR or
other anynomizers.

2) It may not be unique (anymore).

Let's start with issue 1).
==========================

The first two hash characters are from the unix user id - or simply the
user name for those using MS-DOS, VMS or OS/2 - though I was unable to
find anything more recent than Emacs 20.6 for that.  There are Gnu Emacs
> 21 out there for MS-DOS and VMS.

The last four characters are .fsf.  This is uniquely gnus.

Let's consider a person creating an anonymous email account, say with
TOR though any other such service will do.  For the sake of argument, I
do not consider Yahoo, Gmail, etc. anynomous since they include the
originating IP address when using SMTP.  She even has the forethought to
set gnus-user-agent to nil and of course mail-host-address to the domain
of the email service.

Now, said person writes an email to the CEO about unethical practices in
the corporation where she's working.  It just so happens the CEO is in
on the practices and she's known to be the only one using Gnus for email
and the message id exposes her.  Ergo, she's is trouble.

Even if she is not the only one using Gnus the first two hash characters
expose her Unix user id.  Even though most people are using their own
workstation with the default user id for the first account, that number
tends to be different between distros/unix versions[*]. This may be
enough to track the email to a specific person.

Depending on the seriousness of said unethical practices, that person
may have just lost her life to (message-unique-id).

And now to issue 2).
====================

As said before, the first two hash characters are the unix user id.  As
many people are using their own workstations with the default user id,
this is not very unique anymore.

The rest of the hash is calculated from a counter,
message-unique-id-char and the current unix timestamp in seconds.  It is
very probably that at any given point in time two people have the same
value of the counter[**].

Now it is just a matter of those two persons pressing C-c C-c at the
same time.  This is not so far fetched as the workstations' clocks may
not be in sync.

This is unlikely to be a problem but for people setting the
mail-host-address to their email provider.  Say Google or Yahoo.


Final words.
============

I am not proposing any specific change at the moment.  As more and more
people are using anonymizers like TOR bleeding information is not a good
idea anymore.  As many people using public news servers and email
providers and probably setting their mail-host-address accordingly the
chance of id clashes is growing.


[*] For example, mine is 101 while regular Linux users will often have
it 500 or 1000, or maybe 501 or 1001.

[**] The calculations seems to be a bit more involved that just a 1+
counter, but that is what has been the case in my experiments.



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Repost: Time to revisit the message id generation algorithm?
  2012-02-01 18:18 Repost: Time to revisit the message id generation algorithm? Johann 'Myrkraverk' Oskarsson
@ 2012-02-01 23:29 ` Russ Allbery
  2012-02-02 10:10   ` Lars Ingebrigtsen
  0 siblings, 1 reply; 3+ messages in thread
From: Russ Allbery @ 2012-02-01 23:29 UTC (permalink / raw)
  To: ding

"Johann 'Myrkraverk' Oskarsson" <myrkraverk@gmx.com> writes:

> Reposting this here from gnu.emacs.gnus:

> There are two issues I have with the message id generation algorithm;
> the (message-unique-id) function.

> 1) It bleeds information.  This is an issue for those who use TOR or
> other anynomizers.

> 2) It may not be unique (anymore).

There is some discussion of the problems of generating message IDs at:

    http://tools.ietf.org/html/draft-ietf-usefor-message-id-01

although it doesn't touch on the privacy issues.  The recommended
algorithm there is to use a message ID LHS generated from the local
timestamp with as much precision as possible, followed by some sort of
hopefully unique local data.  The suggestion in that draft that I like the
best (although it's a bit slow) is to take a hash of the message to get
the additional unique data, although using a pseudorandom number generator
isn't a bad idea.

One thing that I'd like to request, though, is that if this changes it
would be nice to have an option to keep the current message ID behavior or
something like it.  One of the really neat tricks that the current
behavior allows, provided that one isn't worried about the privacy issues,
is to automatically raise score on any message that is a reply to one of
your messages, since your messages are uniquely identifiable from the
message ID.  I use that technique *heavily*.

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Repost: Time to revisit the message id generation algorithm?
  2012-02-01 23:29 ` Russ Allbery
@ 2012-02-02 10:10   ` Lars Ingebrigtsen
  0 siblings, 0 replies; 3+ messages in thread
From: Lars Ingebrigtsen @ 2012-02-02 10:10 UTC (permalink / raw)
  To: Russ Allbery; +Cc: ding

Russ Allbery <rra@stanford.edu> writes:

> One thing that I'd like to request, though, is that if this changes it
> would be nice to have an option to keep the current message ID behavior or
> something like it.

The privacy leak is pretty minor, I think.  And while it's true that
collisions are more likely to occur than if Gnus were using a more
"random" algo, in practice I don't think anybody has ever reported Gnus
generating a duplicate.

-- 
(domestic pets only, the antidote for overdose, milk.)
  http://lars.ingebrigtsen.no  *  Sent from my Rome



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-02-02 10:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-01 18:18 Repost: Time to revisit the message id generation algorithm? Johann 'Myrkraverk' Oskarsson
2012-02-01 23:29 ` Russ Allbery
2012-02-02 10:10   ` Lars Ingebrigtsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).