caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* multi-threaded udp resolver
@ 2000-03-09 22:34 Julian Assange
  2000-03-18 19:17 ` Dave Mason
  2000-03-20 18:17 ` byte manipulation in O'Caml (was: multi-threaded udp resolver) Francois Pottier
  0 siblings, 2 replies; 5+ messages in thread
From: Julian Assange @ 2000-03-09 22:34 UTC (permalink / raw)
  To: caml-list; +Cc: proff


I've previously written a multi-threaded udp dns resolver in c (not
threads so much as a fsm emulating threads). I'd like to, if possible
write one in ocaml directly, rather than simply hooking into the C
code (which wouldn't be that simple anyway, due to the various timers,
management of fd's etc it needs).

While ocaml provides appropriate udp send/receive functions, the best
mechanism for understanding the structure of dns packets is unknown to
me. DNS packets are `loosely' structured. That is, there are many
different structural elements (including arrays of those elements),
and exactly how they are crammed into a packet can only be determined
by reading the structure. i.e the first part of the structure
describes the type (but not structure) of the next strucural element
and so on.

Vixie's named/bind daemon doesn't even attempt to describe the
structure in any sort of data form, but rather uses the code flow
itself to describe the structure (e.g pulling 16 bits, assigning it to
a variable, advancing the interpretation pointer by 16 bits, testing
the variable, pulling 32 bits etc). This method is incredibly
error-prone, and it's hard to see a good way of fitting it in with
ocaml's type system. Any ideas on the best way to approach this problem?

Cheers,
Julian.

-- 
Stefan Kahrs in [Kah96] discusses the
   notion of completeness--programs which never go wrong can be
   type-checked--which complements Milner's notion of
   soundness--type-checked programs never go wrong [Mil78].



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: multi-threaded udp resolver
  2000-03-09 22:34 multi-threaded udp resolver Julian Assange
@ 2000-03-18 19:17 ` Dave Mason
  2000-03-20 18:17 ` byte manipulation in O'Caml (was: multi-threaded udp resolver) Francois Pottier
  1 sibling, 0 replies; 5+ messages in thread
From: Dave Mason @ 2000-03-18 19:17 UTC (permalink / raw)
  To: caml-list

>>>>> On 10 Mar 2000 09:34:47 +1100, Julian Assange <proff@iq.org> said:

> While ocaml provides appropriate udp send/receive functions, the
> best mechanism for understanding the structure of dns packets is
> unknown to me. DNS packets are `loosely' structured. That is, there
>[...]
> Vixie's named/bind daemon doesn't even attempt to describe the
> structure in any sort of data form, but rather uses the code flow
> itself to describe the structure (e.g pulling 16 bits, assigning it
> to a variable, advancing the interpretation pointer by 16 bits,
> testing the variable, pulling 32 bits etc).

I would unpack the packet into a char list and then process it with something like:

let dispatch cont = function
	| t1::t2::rest -> (match (ord t1)*256+(ord t2) with
		| 1 -> type1 cont rest
		| 2 -> type2 cont rest
		| _ -> error...
		)
	| _ -> error...
and type1 cont = function
	| f1::f2::f3::...::rest] -> cont (...f1...f2...f3...) rest
	| _ -> error...
and type1 cont = function
	| f1::f2::rest] -> dispatch (fun v rest' -> cont (...f1...f2...v...) rest') rest
	| _ -> error...
;;

note that cont is a continuation, that will be used to process the
rest of the string after the current value is parsed.

You could also use an ocaml parser.

> This method is incredibly error-prone, and it's hard to see a good
> way of fitting it in with ocaml's type system. Any ideas on the best
> way to approach this problem?

Getting the types right can be a little tricky, but is doable.

../Dave



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: byte manipulation in O'Caml (was: multi-threaded udp resolver)
  2000-03-09 22:34 multi-threaded udp resolver Julian Assange
  2000-03-18 19:17 ` Dave Mason
@ 2000-03-20 18:17 ` Francois Pottier
  1 sibling, 0 replies; 5+ messages in thread
From: Francois Pottier @ 2000-03-20 18:17 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 808 bytes --]


Hello,

> Vixie's named/bind daemon doesn't even attempt to describe the
> structure in any sort of data form, but rather uses the code flow
> itself to describe the structure (e.g pulling 16 bits, assigning it to
> a variable, advancing the interpretation pointer by 16 bits, testing
> the variable, pulling 32 bits etc). This method is incredibly
> error-prone, and it's hard to see a good way of fitting it in with
> ocaml's type system. Any ideas on the best way to approach this problem?

I have written an O'Caml module, called Zone, which helps writing this
kind of code by providing a set of rather high-level manipulation
primitives for loosely-structured memory blocks. I attach the code
below.

Hope this helps,

-- 
François Pottier
Francois.Pottier@inria.fr
http://pauillac.inria.fr/~fpottier/

[-- Attachment #2: zone.mli --]
[-- Type: text/plain, Size: 4590 bytes --]

(* $Header: /home/pauillac/formel1/fpottier/cvs/laurel/zone.mli,v 1.6 2000/03/01 14:20:44 fpottier Exp $ *)

(* This module defines zones. These objects define an immutable zone in a larger string. They come equipped with a
   mutable ``read head'', which allows easily reading a sequence of elements from a zone. *)

type t

(* This function creates a zone out of a string. The zone spans the whole string. Its read head is initially located
   at the beginning of the zone. *)

val create: string -> t

(* This exception is raised when attempting to read past the end of a zone. *)

exception EOZ

(* This function reads an unsigned 8-bit integer at the read head's current position. After reading, the read head
   advances by one byte. *)

val u1: t -> int

(* This function reads a signed 8-bit integer at the read head's current position. After reading, the read head
   advances by one byte. *)

val s1: t -> int

(* This function reads an unsigned 16-bit integer, in big-endian order, at the read head's current position. After
   reading, the read head advances by two bytes. *)

val u2: t -> int

(* This function reads a signed 16-bit integer, in big-endian order, at the read head's current position. After
   reading, the read head advances by two bytes. *)

val s2: t -> int

(* This function reads an unsigned 16-bit integer, in big-endian order, at the read head's current position. The
   read head does not advance. *)

val peek2: t -> int

(* This function reads a signed 32-bit integer, in big-endian order, at the read head's current position. After
   reading, the read head advances by four bytes. TEMPORARY Because O'Caml integers are signed 31-bit, the result
   will be incorrect if the integer stored in the zone has more than 30 significant bits. *)

val s4: t -> int

(* This function reads a sub-zone of length [length] at the read head's current position. After reading, the
   read head advances by [length] bytes. The newly created zone uses the same data buffer as its parent zone.
   Its read head is initially located at the beginning of the sub-zone. *)

val sub: t -> int -> t

(* This function converts a zone into a string. It is the sub-string of the original buffer string covered by
   the zone. The zone's read head position is irrelevant, and unaffected. *)

val string: t -> string

(* This function reads in a ``table'' consisting of an element count (to be read by [rc]), followed by as many
   (possibly variable-length) elements. [rc] should typically be [u1] or [u2], depending on whether the element count
   is stored using one or two bytes. The [element] auxiliary function is invoked to read each element. It finds the
   read head positioned at the beginning of the element, and must leave it at the end of the element. The table is
   returned as a list, where the first element found in the zone appears first. The read head is advanced to the end
   of the table. *)

val table: t -> (t -> int) -> (t -> 'a) -> 'a list

(* This function reads in a ``table'' consisting of an undetermined number of (possibly variable-length) elements. The
   end of the table is deemed to coincide with the end of the zone. The [element] auxiliary function is invoked to
   read each element. It finds the read head positioned at the beginning of the element, and must leave it at the end
   of the element. The table is returned as a list, where the first element found in the zone appears first. The read
   head is advanced to the end of the table. *)

val undetermined_table: t -> (t -> 'a) -> 'a list

(* This function reads in a ``table'' consisting of an undetermined number of (possibly variable-length) elements. The
   end of the table is deemed to coincide with the end of the zone. The [element] auxiliary function is invoked to
   read each element. It finds the read head positioned at the beginning of the element, and must leave it at the end
   of the element. The table is returned as a hash table which maps zone offsets to elements. The zone's read head is
   advanced to the end of the table. *)

val hash_table: t -> (t -> 'a) -> (int, 'a) Hashtbl.t

(* This function reads in a ``Pascal'' string, i.e. a string prefixed with its length in bytes. The length field is
   to be read by [rc]. The read head is advanced to the end of the string. *)

val pascal: t -> (t -> int) -> string

(* This function returns the zone's current read head position. *)

val head: t -> int

(* This function sets the zone's read head position. *)

val seek: t -> int -> unit

(* This function returns a zone's length. *)

val length: t -> int


[-- Attachment #3: zone.ml --]
[-- Type: text/plain, Size: 7186 bytes --]

(* $Header: /home/pauillac/formel1/fpottier/cvs/laurel/zone.ml,v 1.9 2000/03/02 11:07:26 fpottier Exp $ *)

(* This module defines zones. These objects define an immutable zone in a larger string. They come equipped with a
   mutable ``read head'', which allows easily reading a sequence of elements from a zone. *)

type t = {

    (* The (supposedly immutable) area where the data is stored. *)

    string: string;
    
    (* The zone start offset, comprised between 0 and [string]'s length, inclusive. *)

    start: int;

    (* The zone's length, comprised between 0 and [string]'s length minus [start], inclusive. *)

    length: int;

    (* The zone's read head position, comprised between 0 and [length], inclusive. *)

    mutable head: int

  } 

(* This function creates a zone out of a string. The zone spans the whole string. Its read head is initially located
   at the beginning of the zone. *)

let create string = {
  string = string;
  start = 0;
  length = String.length string;
  head = 0
} 

(* This exception is raised when attempting to read past the end of a zone. *)

exception EOZ

(* This auxiliary function simulates the head's advance, and raises [EOZ] if it would go past the end of the zone.
   It returns the head's new position, but does not actually move it. *)

let peek_ahead zone offset =
  let position = zone.head + offset in
  if position > zone.length then
    raise EOZ;
  position

(* This auxiliary function simulates the head's advance, and raises [EOZ] if it would go past the end of the zone.
   Then, it calls its [action] function; when the function returns, it actually advances the read head, unless some
   exception occurred. *)

let advance zone offset action =
  let position = peek_ahead zone offset in
  let result = action() in
  zone.head <- position;
  result

(* This auxiliary function converts an unsigned integer into a signed one. The [mask] indicates the position of the
   sign bit. *)

let signed mask x =
  if x land mask = 0 then
    x
  else
    x - (mask lsl 1)

(* This function reads an unsigned 8-bit integer at the read head's current position. After reading, the read head
   advances by one byte. *)

let u1 zone =
  advance zone 1 (fun () ->
    Char.code zone.string.[zone.start + zone.head]
  )

(* This function reads a signed 8-bit integer at the read head's current position. After reading, the read head
   advances by one byte. *)

let s1 zone =
  signed 0x80 (u1 zone)

(* This function reads an unsigned 16-bit integer, in big-endian order, at the read head's current position. After
   reading, the read head advances by two bytes. *)

let u2 zone =
  let hi = u1 zone in
  let lo = u1 zone in
  (hi lsl 8) + lo

(* This function reads a signed 16-bit integer, in big-endian order, at the read head's current position. After
   reading, the read head advances by two bytes. *)

let s2 zone =
  signed 0x8000 (u2 zone)

(* This function reads an unsigned 16-bit integer, in big-endian order, at the read head's current position. The
   read head does not advance. *)

let peek2 zone =
  let _ = peek_ahead zone 2 in
  let hi = Char.code zone.string.[zone.start + zone.head] in
  let lo = Char.code zone.string.[zone.start + zone.head + 1] in
  (hi lsl 8) + lo

(* This function reads a signed 32-bit integer, in big-endian order, at the read head's current position. After
   reading, the read head advances by four bytes. TEMPORARY Because O'Caml integers are signed 31-bit, the result
   will be incorrect if the integer stored in the zone has more than 30 significant bits. *)

let s4 zone =
  let hi = u2 zone in
  let lo = u2 zone in
  (hi lsl 16) + lo

(* This function reads a sub-zone of length [length] at the read head's current position. After reading, the
   read head advances by [length] bytes. The newly created zone uses the same data buffer as its parent zone.
   Its read head is initially located at the beginning of the sub-zone. *)

let sub zone length =
  advance zone length (fun () -> {
    string = zone.string;
    start = zone.start + zone.head;
    length = length;
    head = 0
  })

(* This function converts a zone into a string. It is the sub-string of the original buffer string covered by
   the zone. The zone's read head position is irrelevant, and unaffected. *)

let string zone =
  String.sub zone.string zone.start zone.length

(* This function reads in a ``table'' consisting of an element count (to be read by [rc]), followed by as many
   (possibly variable-length) elements. [rc] should typically be [u1] or [u2], depending on whether the element count
   is stored using one or two bytes. The [element] auxiliary function is invoked to read each element. It finds the
   read head positioned at the beginning of the element, and must leave it at the end of the element. The table is
   returned as a list, where the first element found in the zone appears first. The read head is advanced to the end
   of the table. *)

let table zone rc element =
  let rec read count =
    if count = 0 then
      []
    else
      let x = element zone in
      x :: (read (count - 1)) in
  read (rc zone)

(* This function reads in a ``table'' consisting of an undetermined number of (possibly variable-length) elements. The
   end of the table is deemed to coincide with the end of the zone. The [element] auxiliary function is invoked to
   read each element. It finds the read head positioned at the beginning of the element, and must leave it at the end
   of the element. The table is returned as a list, where the first element found in the zone appears first. The read
   head is advanced to the end of the table. *)

let rec undetermined_table zone element =
  if zone.head = zone.length then
    []
  else
    let head = element zone in
    head :: (undetermined_table zone element)

(* This function reads in a ``table'' consisting of an undetermined number of (possibly variable-length) elements. The
   end of the table is deemed to coincide with the end of the zone. The [element] auxiliary function is invoked to
   read each element. It finds the read head positioned at the beginning of the element, and must leave it at the end
   of the element. The table is returned as a hash table which maps zone offsets to elements. The zone's read head is
   advanced to the end of the table. *)

let hash_table zone element =
  let table = Hashtbl.create 1023 in
  let rec read () =
    if zone.head = zone.length then
      ()
    else
      let offset = zone.head in
      Hashtbl.add table offset (element zone);
      read() in
  read();
  table

(* This function reads in a ``Pascal'' string, i.e. a string prefixed with its length in bytes. The length field is
   to be read by [rc]. The read head is advanced to the end of the string. *)

let pascal zone rc =
  string (sub zone (rc zone))

(* This function returns the zone's current read head position. *)

let head zone =
  zone.head

(* This function sets the zone's read head position. *)

let seek zone position =
  assert (position >= 0);
  if position > zone.length then
    raise EOZ;
  zone.head <- position

(* This function returns a zone's length. *)

let length zone =
  zone.length


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: multi-threaded udp resolver
@ 2000-03-20 17:15 Manuel Fahndrich
  0 siblings, 0 replies; 5+ messages in thread
From: Manuel Fahndrich @ 2000-03-20 17:15 UTC (permalink / raw)
  To: 'Julian Assange', 'caml-list@inria.fr'


Have a look at

Satish Chandra and Peter J. McCann, "Packet Types", in the Second Workshop
on Compiler Support for Systems Software (WCSSS), May 1999. 

The paper tries to formalize the structure of self-describing data
structures such as network packets.

Not that this is implemented in OCaml, but maybe a CAMLp4 prepass would
help.

-Manuel


-----Original Message-----
From: Julian Assange [mailto:proff@iq.org]
Sent: Saturday, March 18, 2000 9:57 AM
To: caml-redistribution@pauillac.inria.fr
Cc: proff@iq.org
Subject: multi-threaded udp resolver



I've previously written a multi-threaded udp dns resolver in c (not
threads so much as a fsm emulating threads). I'd like to, if possible
write one in ocaml directly, rather than simply hooking into the C
code (which wouldn't be that simple anyway, due to the various timers,
management of fd's etc it needs).

While ocaml provides appropriate udp send/receive functions, the best
mechanism for understanding the structure of dns packets is unknown to
me. DNS packets are `loosely' structured. That is, there are many
different structural elements (including arrays of those elements),
and exactly how they are crammed into a packet can only be determined
by reading the structure. i.e the first part of the structure
describes the type (but not structure) of the next strucural element
and so on.

Vixie's named/bind daemon doesn't even attempt to describe the
structure in any sort of data form, but rather uses the code flow
itself to describe the structure (e.g pulling 16 bits, assigning it to
a variable, advancing the interpretation pointer by 16 bits, testing
the variable, pulling 32 bits etc). This method is incredibly
error-prone, and it's hard to see a good way of fitting it in with
ocaml's type system. Any ideas on the best way to approach this problem?

Cheers,
Julian.

-- 
Stefan Kahrs in [Kah96] discusses the
   notion of completeness--programs which never go wrong can be
   type-checked--which complements Milner's notion of
   soundness--type-checked programs never go wrong [Mil78].



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re:  multi-threaded udp resolver
@ 2000-03-10 19:05 Damien Doligez
  0 siblings, 0 replies; 5+ messages in thread
From: Damien Doligez @ 2000-03-10 19:05 UTC (permalink / raw)
  To: caml-list

>From: Julian Assange <proff@iq.org>

>While ocaml provides appropriate udp send/receive functions, the best
>mechanism for understanding the structure of dns packets is unknown to
>me.

What you describe looks like a parsing problem to me.  I would try to
use ocamlyacc or stream parsers if I had to solve your problem.  But
I've never done anything like that, so I can't guarantee it's really
the best way to do it.

-- Damien



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2000-03-21 14:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-03-09 22:34 multi-threaded udp resolver Julian Assange
2000-03-18 19:17 ` Dave Mason
2000-03-20 18:17 ` byte manipulation in O'Caml (was: multi-threaded udp resolver) Francois Pottier
2000-03-10 19:05 multi-threaded udp resolver Damien Doligez
2000-03-20 17:15 Manuel Fahndrich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).