Requirements for new dns backend, factoring considerations

mailing list of musl libc
 help / color / mirror / code / Atom feed

* Requirements for new dns backend, factoring considerations
@ 2014-06-01  6:31 Rich Felker
  2014-06-01  8:01 ` u-igbb
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Rich Felker @ 2014-06-01  6:31 UTC (permalink / raw)
  To: musl

I'm partly just writing this as my own notes to document the design
process, and partly looking for feedback on some of these decisions
for the new dns backend. With that said, here's a lost of components
that will be using it:

1. __lookup_name (getaddrinfo/gethostbyname* backend), which must be
   able to perform parallel lookups of A and AAAA records, and
   multiple search domain suffixes. For A/AAAA, both answers are
   needed. For search domains, all results except the best for each
   can be thrown away.

2. getnameinfo, which only needs a single PTR query.

3. res_search and res_query, whose requirements are similar to those
   of __lookup_name.

4. res_send, which makes a single query using a caller-provided
   query packet.

At first it seems like __lookup_name's dns backend could be
implemented on top of res_search, which could be implemented as
multiple calls to res_query in the search order, which could in turn
be implemented as a call to res_send.

The problem however is the different parallelism requirements.

It's suboptimal to implement res_search in terms of res_query because
you have to wait for one search suffix to fail before trying the next
one. At the res_search/res_query level it's not so bad though. The
parallel search can be the fundamental operation, and res_search and
res_query can just call a common back-end with an argument for whether
to search or not.

The problem however is implementing this on top of something that
looks like res_send. Even if not for search paths, res_search and
res_query need parallel A and AAAA queries, whereas res_send has no
means to request both. We could imagine implementing res_send on top
of a hypothetical "res_multisend" with a count of 1. While this would
work, it's not a very friendly interface for implementing res_search
or res_query since they would have to provide a large number of
pre-generated query packets (6 search domains * 2 RR types * up to 280
bytes per query packet = 3360 bytes of stack usage) despite it being
trivial to generate the Nth packet in just 280 bytes of storage. The
storage requirements for storing all the results are even worse
(6*2*512 = 6144) compared to what's actually needed (2*512 = 1024).

The alternative I see is some sort of "res_multisend" that uses a
callback to let the caller generate the Nth packet and a callback to
notify the caller of replies. Then res_send could be implemented with
callbacks that just feed in and save out the single query/response.
And res_search would generate all the query packets but only save the
"current best match" for each address family.

As another alternative, we could drop the goal of doing search
suffixes in parallel. This would have no bearing on lookups of fully
qualified names using the default settings (ndots:1) since the
presence of a dot suppresses search. Where it would negatively impact
performance, though, is for users who want to have several search
domains (think: home network, university network, department-specific
university network, etc.) for quick shortcuts to machines on multiple
different networks.

Another option still is leaving search domains unimplemented (musl has
not supported them up til now, and there hasn't been much request for
them). But if there is, or will be, demand for them, I don't want the
resolver overhaul design to preclude doing them (or preclude making
them perform decently).

Anyway, if we do decide that parallel search suffixes are not
important, parallel A/AAAA queries are still needed and outside the
scope of what res_send can do. It would be plausible to make an
interface like res_send, in terms of which res_send could be
implemented, but which has an extra boolean argument for "send both A
and AAAA versions of this query". Or we could just accept the cost of
having two copies of the query packet (an extra 280 bytes is
negligible, I think) and make a trivial res_multisend that does N(=2)
queries in parallel using packets provided by the caller.

Rich

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Requirements for new dns backend, factoring considerations
  2014-06-01  6:31 Requirements for new dns backend, factoring considerations Rich Felker
@ 2014-06-01  8:01 ` u-igbb
  2014-06-01 14:36   ` Rich Felker
  2014-06-01 11:19 ` Laurent Bercot
  2014-06-01 15:08 ` Rich Felker
  2 siblings, 1 reply; 6+ messages in thread
From: u-igbb @ 2014-06-01  8:01 UTC (permalink / raw)
  To: musl

Hello Rich,

On Sun, Jun 01, 2014 at 02:31:03AM -0400, Rich Felker wrote:
> process, and partly looking for feedback on some of these decisions

Here you are:

> As another alternative, we could drop the goal of doing search
> suffixes in parallel. This would have no bearing on lookups of fully
> qualified names using the default settings (ndots:1) since the
> presence of a dot suppresses search. Where it would negatively impact
> performance, though, is for users who want to have several search
> domains (think: home network, university network, department-specific
> university network, etc.) for quick shortcuts to machines on multiple
> different networks.

My experience is that such kind of shortcuts is dangerous and inconsistent.
They stir different namespaces, this can not give a reliable outcome
in a general case.

What a certain shortcut resolves to depends on too many things and among
others on which changes are made by third parties to the contents of
the name spaces which you short-circuit (a new host in one's department
can easily take the place of a desired host at a different department).

So I would not care less about efficiency of an uncertain and inconsistent
practice/tool :)

> Another option still is leaving search domains unimplemented (musl has
> not supported them up til now, and there hasn't been much request for

As I see this, spending your time on other things might be a better choice.

> them). But if there is, or will be, demand for them, I don't want the
> resolver overhaul design to preclude doing them

This is surely reasonable.

> (or preclude making
> them perform decently).

I guess it is very few people in rare situations who might be hit by
performance issues there, which would most probably also imply that they
have a badly thought-out setup.

So much for the feedback.

Thanks for your work Rich.

Rune

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Requirements for new dns backend, factoring considerations
  2014-06-01  6:31 Requirements for new dns backend, factoring considerations Rich Felker
  2014-06-01  8:01 ` u-igbb
@ 2014-06-01 11:19 ` Laurent Bercot
  2014-06-01 14:53   ` Rich Felker
  2014-06-01 15:08 ` Rich Felker
  2 siblings, 1 reply; 6+ messages in thread
From: Laurent Bercot @ 2014-06-01 11:19 UTC (permalink / raw)
  To: musl

  Hi Rich,
  Great work, as usual.

> The problem however is implementing this on top of something that
> looks like res_send. Even if not for search paths, res_search and
> res_query need parallel A and AAAA queries, whereas res_send has no
> means to request both. We could imagine implementing res_send on top
> of a hypothetical "res_multisend" with a count of 1. While this would
> work, it's not a very friendly interface for implementing res_search
> or res_query since they would have to provide a large number of
> pre-generated query packets (6 search domains * 2 RR types * up to 280
> bytes per query packet = 3360 bytes of stack usage) despite it being
> trivial to generate the Nth packet in just 280 bytes of storage. The
> storage requirements for storing all the results are even worse
> (6*2*512 = 6144) compared to what's actually needed (2*512 = 1024).

  I actually never thought about that. Since s6-dns stores answers in the
heap, it doesn't have to pre-allocate storage for them, so it happily
sends everything in parallel.

> The alternative I see is some sort of "res_multisend" that uses a
> callback to let the caller generate the Nth packet and a callback to
> notify the caller of replies. Then res_send could be implemented with
> callbacks that just feed in and save out the single query/response.
> And res_search would generate all the query packets but only save the
> "current best match" for each address family.

  That sounds reasonable.

> As another alternative, we could drop the goal of doing search
> suffixes in parallel.

  The best choice depends on your timeout values and retry policy.

  As I already said: this is network, so it's going to suck, and this is
DNS, so it's going to suck even more. If a server does not answer, which
will happen, and you have to wait for 20 seconds before considering it a
failure, then serial search is just going to make users mad. Multiple
server failure is a common occurrence (client-side network issue, for
instance); I'd rather have a clue before 2 minutes, and most users will
^C anything that takes longer than 30 seconds - and ^C doesn't bring any
debugging information.

  So if your timeouts are very short, sure, serial search will work. But
if they are not, I'd say parallel search or drop search entirely. To me,
even 6k of stack for every getaddrinfo() invocation is better than
potentially huge network latency, especially since the getaddrinfo
(or even the res_*) API provides no way of setting bounds to the waiting
time.

  This is the 21st century, and even though I hate being wasteful as much
as you do, memory is definitely cheaper than users' time. And memory
usage - especially stack memory - is so much more trackable than network
latency.

> negligible, I think) and make a trivial res_multisend that does N(=2)
> queries in parallel using packets provided by the caller.

  That's what the synchronous s6-dns resolver does, and I think that's
the least amount of parallelism that's still acceptable in a v4+v6 world.

-- 
  Laurent

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Requirements for new dns backend, factoring considerations
  2014-06-01  8:01 ` u-igbb
@ 2014-06-01 14:36   ` Rich Felker
  0 siblings, 0 replies; 6+ messages in thread
From: Rich Felker @ 2014-06-01 14:36 UTC (permalink / raw)
  To: musl

On Sun, Jun 01, 2014 at 10:01:51AM +0200, u-igbb@aetey.se wrote:
> > As another alternative, we could drop the goal of doing search
> > suffixes in parallel. This would have no bearing on lookups of fully
> > qualified names using the default settings (ndots:1) since the
> > presence of a dot suppresses search. Where it would negatively impact
> > performance, though, is for users who want to have several search
> > domains (think: home network, university network, department-specific
> > university network, etc.) for quick shortcuts to machines on multiple
> > different networks.
> 
> My experience is that such kind of shortcuts is dangerous and inconsistent.
> They stir different namespaces, this can not give a reliable outcome
> in a general case.

I tend to agree with this a lot. As long as you leave ndots=1, your
shortcuts have no dots in them, and you only have one search domain, I
think the results can be fairly reliable/consistent. But in general
you're right. I'm not sure if other implementations fall back to
searching when this_query.dots>=ndots and looking up this_query as a
fqdn failed, but doing so sounds like bad and dangerous behavior and
I'd strongly prefer not to do it.

> What a certain shortcut resolves to depends on too many things and among
> others on which changes are made by third parties to the contents of
> the name spaces which you short-circuit (a new host in one's department
> can easily take the place of a desired host at a different department).

Yes, I think this feature is harmful whenever third-party changes can
affect _which_ result you see and not just whether you see a result or
not. Note that any use of ndots>1 leads to such issues (e.g. do you get
google.com or google.com.example.com?).

> > Another option still is leaving search domains unimplemented (musl has
> > not supported them up til now, and there hasn't been much request for
> 
> As I see this, spending your time on other things might be a better choice.

If we don't want to accelerate it with parallel lookups, leaving it
for later, and only if there's demand, is a perfectly viable strategy.
It's only a consideration now if parallelism is considered important.

Rich

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Requirements for new dns backend, factoring considerations
  2014-06-01 11:19 ` Laurent Bercot
@ 2014-06-01 14:53   ` Rich Felker
  0 siblings, 0 replies; 6+ messages in thread
From: Rich Felker @ 2014-06-01 14:53 UTC (permalink / raw)
  To: musl

On Sun, Jun 01, 2014 at 12:19:44PM +0100, Laurent Bercot wrote:
> 
>  Hi Rich,
>  Great work, as usual.
> 
> 
> >The problem however is implementing this on top of something that
> >looks like res_send. Even if not for search paths, res_search and
> >res_query need parallel A and AAAA queries, whereas res_send has no
> >means to request both. We could imagine implementing res_send on top
> >of a hypothetical "res_multisend" with a count of 1. While this would
> >work, it's not a very friendly interface for implementing res_search
> >or res_query since they would have to provide a large number of
> >pre-generated query packets (6 search domains * 2 RR types * up to 280
> >bytes per query packet = 3360 bytes of stack usage) despite it being
> >trivial to generate the Nth packet in just 280 bytes of storage. The
> >storage requirements for storing all the results are even worse
> >(6*2*512 = 6144) compared to what's actually needed (2*512 = 1024).
> 
>  I actually never thought about that. Since s6-dns stores answers in the
> heap, it doesn't have to pre-allocate storage for them, so it happily
> sends everything in parallel.

Well the important part is the same: dirty pages. Whether it's on the
heap or the stack, an extra 9k of temp data means touching 2-3 extra
pages that may have previously been untouched. If this just happens at
startup, the memory usage persists for the rest of the process's
lifetime and it's essentially wasted.

Also the 9k is just _additional_ here. There are already several
0.5k-1k buffers for accessing files, storing address results, storing
the canonical name, etc. and it quickly adds up. It doesn't reach the
threshold where I'd say "this isn't reasonable to assume we have
available on the stack" but it's cost and probably throws getaddrinfo
into being "musl's biggest stack user" by a nontrivial margin
(otherwise printf with float is the biggest).

> >The alternative I see is some sort of "res_multisend" that uses a
> >callback to let the caller generate the Nth packet and a callback to
> >notify the caller of replies. Then res_send could be implemented with
> >callbacks that just feed in and save out the single query/response.
> >And res_search would generate all the query packets but only save the
> >"current best match" for each address family.
> 
>  That sounds reasonable.

It's definitely reasonable from an efficiency standpoint, but it's
also the most complex approach (storing the working set in structs and
passing a context back and forth, how to pass buffers back and forth
without wasteful copying, etc.), and possibly has the largest code
size too.

> >As another alternative, we could drop the goal of doing search
> >suffixes in parallel.
> 
>  The best choice depends on your timeout values and retry policy.

Current retry time is 1s and failure timeout is 5s. These should be
configurable, but to do that, we need a way of reinterpreting
resolv.conf's timeout settings for the way musl does things: musl does
not wait for one nameserver to timeout then fallback to the next, but
queries them in parallel. (I know some people have doubts about this,
but in practice it results in massive performance improvement for
resolving, especially if you have several nameservers with different
latency and caching properties, such as localhost, isp-nameserver, and
8.8.8.8.)

>  So if your timeouts are very short, sure, serial search will work. But

Timeout is not really the relevant factor unless your nameservers are
misconfigured. A properly configured nameserver returns a negative
response rather than just timing out, but it might not cache negative
results (or might not cache them long) so the request may have latency
higher than a typical request due to repeating the whole recursive
lookup, but it still should be nowhere near as long as a timeout.

> >negligible, I think) and make a trivial res_multisend that does N(=2)
> >queries in parallel using packets provided by the caller.
> 
>  That's what the synchronous s6-dns resolver does, and I think that's
> the least amount of parallelism that's still acceptable in a v4+v6 world.

That's what musl's current implementation does too, and this is not a
feature I'd want to drop. In fact I consider it essential to adoption
of IPv6; otherwise everyone will disable IPv6 because it "makes DNS
lookups slower".

Rich

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Requirements for new dns backend, factoring considerations
  2014-06-01  6:31 Requirements for new dns backend, factoring considerations Rich Felker
  2014-06-01  8:01 ` u-igbb
  2014-06-01 11:19 ` Laurent Bercot
@ 2014-06-01 15:08 ` Rich Felker
  2 siblings, 0 replies; 6+ messages in thread
From: Rich Felker @ 2014-06-01 15:08 UTC (permalink / raw)
  To: musl

On Sun, Jun 01, 2014 at 02:31:03AM -0400, Rich Felker wrote:
> [...]
> Anyway, if we do decide that parallel search suffixes are not
> important, parallel A/AAAA queries are still needed and outside the
> scope of what res_send can do. It would be plausible to make an
> interface like res_send, in terms of which res_send could be
> implemented, but which has an extra boolean argument for "send both A
> and AAAA versions of this query". Or we could just accept the cost of
> having two copies of the query packet (an extra 280 bytes is
> negligible, I think) and make a trivial res_multisend that does N(=2)
> queries in parallel using packets provided by the caller.

Based on the responses so far and my own thoughts on the matter, the
way I'm leaning is:

- Parallelizing (or even supporting) search domains is low-priority,
  but it's best to have a design that doesn't pessimize them if we do
  want to add them or (worse yet) require major redesign again.

- The complexity of callbacks, complex interface contracts between
  layers, etc. is not worth it to save a little stack space.

- Parallelism of A/AAAA like we have now is essential.

So the design I'm leaning towards is a simple res_msend function which
takes a list of caller-provided query packets and produces a result
for each one, which the caller can then interpret. res_send is then
just calling this function with N=1.

At first, search domains would not even be implemented. If we want to
add them later, we would in principle have the flexibility to choose
between sequential queries via multiple calls to res_msend, or one
call to res_msend with a much larger N value, but the search would
then not terminate when the preferred results were already obtained
but results for the fallbacks were still pending. So I think, unless
we wanted to expand res_msend to make some of the queries "weak", we'd
be stuck with the sequential fallback. I don't see this as a major
problem though.

As a variant, res_msend could take a list of iovec structures rather
than flat buffers, so that part of the query could be shared between
queries. This slightly pessimizes short name lookups, I think, but for
near-maximum-length names it would drop the data size from 560 down to
~320. Probably not worth the complexity, but I considered it just
because sendmsg() accepts iovecs anyway.

Still open to further comments, but if nobody has any objections or
new thoughts that cause me to rethink things, I'll probably start
implementing this in the next 12-24 hours.

Rich

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-06-01 15:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-01  6:31 Requirements for new dns backend, factoring considerations Rich Felker
2014-06-01  8:01 ` u-igbb
2014-06-01 14:36   ` Rich Felker
2014-06-01 11:19 ` Laurent Bercot
2014-06-01 14:53   ` Rich Felker
2014-06-01 15:08 ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).