[9fans] Go: FP in note handler

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* [9fans]  Go: FP in note handler
       [not found] <fb16aa69c6b73a2e8f0260b4e6ee025c@hamnavoe.com>
@ 2016-02-22 21:56 ` Kenny Lasse Hoff Levinsen
  0 siblings, 0 replies; 9+ messages in thread
From: Kenny Lasse Hoff Levinsen @ 2016-02-22 21:56 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

For those interested in the matter, I have opened https://github.com/golang/go/issues/14471

I mention potentially reenabling duffcopy by writing some magic note handler code that avoid the regular copy and zero optimizations, but I’m not entirely sure if that’s a plausible path. If it is, I think it would bring benefit, both in the performance gained by duffcopy/duffzero, as well as the chances of this happening again. It is, however, slightly annoying to do, as you cannot use copy(), make() or even strings or byte array literals, as these will trip duffcopy and duffzero. Any comments to my silly idea?

Best regards,
Kenny Levinsen

> On 22 Feb 2016, at 18:16, Richard Miller <miller@hamnavoe.com> wrote:
> 
>> The trace of goexitsall still contain FP register access (XORPS and duffzero which contains MOVUPS)
> 
> Sorry, in that case I think my patch is not relevant for your issue
> (but it does prevent a deadlock on multiprocessors which you might
> also run into...)
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Go: FP in note handler
  2016-02-23 17:31           ` lucio
@ 2016-02-23 17:56             ` Kenny Lasse Hoff Levinsen
  0 siblings, 0 replies; 9+ messages in thread
From: Kenny Lasse Hoff Levinsen @ 2016-02-23 17:56 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


> On 23 Feb 2016, at 18:31, lucio@proxima.alt.za wrote:
> 
>> A proper duffcopy/duffzero/memmove is also an option.
> 
> The adjective "proper" is revealing.  I vote for that.
> 
> Lucio.
> 
> 

It’s a bit out of my usual area of expertise, however. I have no idea what benchmark they have been running, either. Any pointers?

Kenny


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Go: FP in note handler
  2016-02-23 17:14         ` Kenny Lasse Hoff Levinsen
@ 2016-02-23 17:31           ` lucio
  2016-02-23 17:56             ` Kenny Lasse Hoff Levinsen
  0 siblings, 1 reply; 9+ messages in thread
From: lucio @ 2016-02-23 17:31 UTC (permalink / raw)
  To: 9fans

> A proper duffcopy/duffzero/memmove is also an option.

The adjective "proper" is revealing.  I vote for that.

Lucio.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Go: FP in note handler
  2016-02-23 17:02       ` erik quanstrom
@ 2016-02-23 17:14         ` Kenny Lasse Hoff Levinsen
  2016-02-23 17:31           ` lucio
  0 siblings, 1 reply; 9+ messages in thread
From: Kenny Lasse Hoff Levinsen @ 2016-02-23 17:14 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

A proper duffcopy/duffzero/memmove is also an option.

Best regards,
Kenny Levinsen

> On 23. feb. 2016, at 18.02, erik quanstrom <quanstro@quanstro.net> wrote:
> 
>> On Tue Feb 23 07:55:26 PST 2016, kennylevinsen@gmail.com wrote:
>> A benchmark was supposedly made of the new duffcopy/duffzero which claimed significant speedup for larger copies: https://github.com/golang/go/commit/5cf281a9b791f0f10efd1574934cbb19ea1b33da
>> 
>> I have no clue whether this holds true or not. My intention to reenable duffcopy and continue to use duffzero is mostly to avoid differences and ensure that the note handlers are floating point free in the future. Whether the duffcopy/duffzero’s current form is an actual optimization or just a complexity, I cannot say. A test was made in #cat-v out of annoyance where the result seemed to be that it was indeed faster to use MOVUPS, but I don’t remember the details.
> 
> that post is a speedup relative to the original asm, which might not be as good as the best
> non-sse versions, and it is also for amd64.
> 
> - erik
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Go: FP in note handler
  2016-02-23 15:52     ` Kenny Lasse Hoff Levinsen
@ 2016-02-23 17:02       ` erik quanstrom
  2016-02-23 17:14         ` Kenny Lasse Hoff Levinsen
  0 siblings, 1 reply; 9+ messages in thread
From: erik quanstrom @ 2016-02-23 17:02 UTC (permalink / raw)
  To: 9fans

On Tue Feb 23 07:55:26 PST 2016, kennylevinsen@gmail.com wrote:
> A benchmark was supposedly made of the new duffcopy/duffzero which claimed significant speedup for larger copies: https://github.com/golang/go/commit/5cf281a9b791f0f10efd1574934cbb19ea1b33da
> 
> I have no clue whether this holds true or not. My intention to reenable duffcopy and continue to use duffzero is mostly to avoid differences and ensure that the note handlers are floating point free in the future. Whether the duffcopy/duffzero’s current form is an actual optimization or just a complexity, I cannot say. A test was made in #cat-v out of annoyance where the result seemed to be that it was indeed faster to use MOVUPS, but I don’t remember the details.

that post is a speedup relative to the original asm, which might not be as good as the best
non-sse versions, and it is also for amd64.

- erik



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Go: FP in note handler
  2016-02-23 15:27   ` erik quanstrom
@ 2016-02-23 15:52     ` Kenny Lasse Hoff Levinsen
  2016-02-23 17:02       ` erik quanstrom
  0 siblings, 1 reply; 9+ messages in thread
From: Kenny Lasse Hoff Levinsen @ 2016-02-23 15:52 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

A benchmark was supposedly made of the new duffcopy/duffzero which claimed significant speedup for larger copies: https://github.com/golang/go/commit/5cf281a9b791f0f10efd1574934cbb19ea1b33da

I have no clue whether this holds true or not. My intention to reenable duffcopy and continue to use duffzero is mostly to avoid differences and ensure that the note handlers are floating point free in the future. Whether the duffcopy/duffzero’s current form is an actual optimization or just a complexity, I cannot say. A test was made in #cat-v out of annoyance where the result seemed to be that it was indeed faster to use MOVUPS, but I don’t remember the details.

Best regards,
Kenny Levinsen

> On 23 Feb 2016, at 16:27, erik quanstrom <quanstro@quanstro.net> wrote:
> 
> On Tue Feb 23 02:36:41 PST 2016, kennylevinsen@gmail.com wrote:
>> Ah, no - it is not a system-wide adjustment, but adjustment of the plan9 specific runtime.sighandler implementation and everything called by it directly. Notes that don't exit the process are queued and should run outside the actual note handler.
>> 
>> I think the "magic" code will be isolated, and might fend off accidental future additions of floating point registers. The magic-ness also only revolves around avoiding duffzero and duffcopy in some way. I also think that removing conditionals in the compiler will be a positive thing.
>> 
>> I still do not know the feasibility of my plan, whether it is possible to do cleanly, or possible at all. Maybe someone smarter than me with knowledge on the matter could chime in and call me an idiot?
>> 
>> Avoiding duffcopy should be easy with a simple memmove implementation. If done right, we can also remove the plan9 specific runtime.memmove and only use the slow memmove in sighandler (The globlal runtime.memmove is implemented using MOVUPS just like duffcopy. Duffcopy is used for blockcopies by the compiler in some cases, although I must admit to not know all the cases yet).
>> 
>> Avoiding duffzero without compiler assistance is a bit more tricky - global variables, stack on assembly functions, something like that.
> 
> fwiw, on modern amd64 machines, using the xmm and ymm registers has a benefit only in a narrow range
> of sizes (384-511 bytes) and a subset of (mis-)alignments that i've forgotten.  at least for the exact test setup
> i used on 3-4 different µarches.  intel claims rep; movs is the (architecturally) fastest way to go.
> 
> i am not sure any of this makes much difference, as it's hard to know what a real-world memory
> access pattern looks like, and that seems to dominate all but gigantic moves, for which rep; movs
> is actually no slower than even the trickiest use of ymm registers.
> 
> - erik
> 




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Go: FP in note handler
  2016-02-23 10:17 ` Kenny Lasse Hoff Levinsen
@ 2016-02-23 15:27   ` erik quanstrom
  2016-02-23 15:52     ` Kenny Lasse Hoff Levinsen
  0 siblings, 1 reply; 9+ messages in thread
From: erik quanstrom @ 2016-02-23 15:27 UTC (permalink / raw)
  To: 9fans

On Tue Feb 23 02:36:41 PST 2016, kennylevinsen@gmail.com wrote:
> Ah, no - it is not a system-wide adjustment, but adjustment of the plan9 specific runtime.sighandler implementation and everything called by it directly. Notes that don't exit the process are queued and should run outside the actual note handler.
> 
> I think the "magic" code will be isolated, and might fend off accidental future additions of floating point registers. The magic-ness also only revolves around avoiding duffzero and duffcopy in some way. I also think that removing conditionals in the compiler will be a positive thing.
> 
> I still do not know the feasibility of my plan, whether it is possible to do cleanly, or possible at all. Maybe someone smarter than me with knowledge on the matter could chime in and call me an idiot?
> 
> Avoiding duffcopy should be easy with a simple memmove implementation. If done right, we can also remove the plan9 specific runtime.memmove and only use the slow memmove in sighandler (The globlal runtime.memmove is implemented using MOVUPS just like duffcopy. Duffcopy is used for blockcopies by the compiler in some cases, although I must admit to not know all the cases yet).
> 
> Avoiding duffzero without compiler assistance is a bit more tricky - global variables, stack on assembly functions, something like that.

fwiw, on modern amd64 machines, using the xmm and ymm registers has a benefit only in a narrow range
of sizes (384-511 bytes) and a subset of (mis-)alignments that i've forgotten.  at least for the exact test setup
i used on 3-4 different µarches.  intel claims rep; movs is the (architecturally) fastest way to go.

i am not sure any of this makes much difference, as it's hard to know what a real-world memory
access pattern looks like, and that seems to dominate all but gigantic moves, for which rep; movs
is actually no slower than even the trickiest use of ymm registers.

- erik



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Go: FP in note handler
       [not found] <e115154c5bab8971d7b88f64ba5d4402@proxima.alt.za>
@ 2016-02-23 10:17 ` Kenny Lasse Hoff Levinsen
  2016-02-23 15:27   ` erik quanstrom
  0 siblings, 1 reply; 9+ messages in thread
From: Kenny Lasse Hoff Levinsen @ 2016-02-23 10:17 UTC (permalink / raw)
  To: 9fans

Ah, no - it is not a system-wide adjustment, but adjustment of the plan9 specific runtime.sighandler implementation and everything called by it directly. Notes that don't exit the process are queued and should run outside the actual note handler.

I think the "magic" code will be isolated, and might fend off accidental future additions of floating point registers. The magic-ness also only revolves around avoiding duffzero and duffcopy in some way. I also think that removing conditionals in the compiler will be a positive thing.

I still do not know the feasibility of my plan, whether it is possible to do cleanly, or possible at all. Maybe someone smarter than me with knowledge on the matter could chime in and call me an idiot?

Avoiding duffcopy should be easy with a simple memmove implementation. If done right, we can also remove the plan9 specific runtime.memmove and only use the slow memmove in sighandler (The globlal runtime.memmove is implemented using MOVUPS just like duffcopy. Duffcopy is used for blockcopies by the compiler in some cases, although I must admit to not know all the cases yet).

Avoiding duffzero without compiler assistance is a bit more tricky - global variables, stack on assembly functions, something like that.

Best regards,
Kenny Levinsen

On 23. feb. 2016, at 10.05, lucio@proxima.alt.za wrote:

>> Well, avoiding XMM registers in duffcopy/duffzero is one solution, but
>> I was thinking of working around them entirely in code called from the
>> note handler, so that duffcopy/duffzero can operate as intended on
>> plan9, rather than littering the compiler with OS conditionals.
> 
> Do you think you'll be able to sell that to the Go developers?  You
> ARE talking about a system-wide adjustment and it seems to me that it
> will need constant supervision to be maintained.  Again, I may have
> misunderstood, but it does seem like a maintenance nightmare to me.
> 
> As for:
> 
>> To fix the duffzero, we'd have to fix runtime.goexitsall's buffer
>> usage, but to reenable duffcopy, we'd have to look at the much bigger
>> runtime.sighandler.
> 
> That is undeniable, but to avoid a different type of maintenance
> nightmare, may be the only option.  Although "fixing" duffcopy and
> duffzero would seem a better, if less efficient option.
> 
> Still, it's the opinion of a none-too-well-informed spectator, do not
> let me spoil it for you.  In particular, I'm sure I'm not telling you
> anything you have not already considered.
> 
> Lucio.
> 
> PS: I do think that it is our responsibility to track each and every
> aspect of Go where Plan 9 demands special treatment.  Ideally, this
> means build flags or specially named modules and a commitment from a
> few of us to keep these in sync.  Anything else becomes someone else's
> responsibility and that is risky.
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Go: FP in note handler
       [not found] <83031ee52a34de15facd95dcbcabbad9@proxima.alt.za>
@ 2016-02-23  8:26 ` Kenny Lasse Hoff Levinsen
  0 siblings, 0 replies; 9+ messages in thread
From: Kenny Lasse Hoff Levinsen @ 2016-02-23  8:26 UTC (permalink / raw)
  To: 9fans; +Cc: lucio

Well, avoiding XMM registers in duffcopy/duffzero is one solution, but I was thinking of working around them entirely in code called from the note handler, so that duffcopy/duffzero can operate as intended on plan9, rather than littering the compiler with OS conditionals.

It puts some restrictions on the note handling code, such as no copy(), make() or even an on-stack var b [n]byte. Due to sighandler disabling write barriers, we can't currently allocate on the heap, meaning that we might need either locked global buffers (which can be duffzeroed) or more assembly so we can use on-stack buffers (which could be zeroed if we wanted to, they just can't use duffzero for it).

To fix the duffzero, we'd have to fix runtime.goexitsall's buffer usage, but to reenable duffcopy, we'd have to look at the much bigger runtime.sighandler.

Best regards,
Kenny Levinsen

On 23. feb. 2016, at 08.20, lucio@proxima.alt.za wrote:

>> Duffcopy is disabled from plan9 after the last bug report on the
>> matter, but duffzero was later optimized to use XMM registers, causing
>> goexitsall, which use an on-stack byte array to make a new note, to
>> call duffzero and trip the fp in note handler message.
> 
> I had to re-read this to understand this because you tend to put at
> the end what I would find easier to understand if it was at the
> beginning.  No offence meant, different punctuation would have perhaps
> helped my understanding.
> 
> So, we need a duffcopy and duffzero that do not use XMM registers,
> rather than stop invoking them, if I read your comment correctly?
> 
> I also have an open issue (I see David has offered to look into it
> soon) involving syscalls and their error messages, it seems these are
> all Plan 9 specific issues that could be addressed together.
> 
> I really would like to take a more active role in Go for Plan 9, but I
> can't yet give it the priority I'd like.  Still, I like hearing from
> others who take this to heart.
> 
> Lucio.
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-02-23 17:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <fb16aa69c6b73a2e8f0260b4e6ee025c@hamnavoe.com>
2016-02-22 21:56 ` [9fans] Go: FP in note handler Kenny Lasse Hoff Levinsen
     [not found] <83031ee52a34de15facd95dcbcabbad9@proxima.alt.za>
2016-02-23  8:26 ` Kenny Lasse Hoff Levinsen
     [not found] <e115154c5bab8971d7b88f64ba5d4402@proxima.alt.za>
2016-02-23 10:17 ` Kenny Lasse Hoff Levinsen
2016-02-23 15:27   ` erik quanstrom
2016-02-23 15:52     ` Kenny Lasse Hoff Levinsen
2016-02-23 17:02       ` erik quanstrom
2016-02-23 17:14         ` Kenny Lasse Hoff Levinsen
2016-02-23 17:31           ` lucio
2016-02-23 17:56             ` Kenny Lasse Hoff Levinsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).