* [9front] qemu clock/timer issue
@ 2022-02-05 2:04 Michael Forney
2022-02-05 17:04 ` hiro
2022-02-05 18:14 ` Nick Owens
0 siblings, 2 replies; 5+ messages in thread
From: Michael Forney @ 2022-02-05 2:04 UTC (permalink / raw)
To: 9front
I was recently running 9front in a VM on linux (qemu, with
-enable-kvm), and noticed several weird timing/clock issues. The date
command gives wildly varying results +- 1 minute:
cpu% date
Fri Feb 4 17:39:23 PST 2022
cpu% date
Fri Feb 4 17:39:12 PST 2022
cpu% date
Fri Feb 4 17:40:19 PST 2022
cpu% date
Fri Feb 4 17:39:18 PST 2022
`sleep 1` exits pretty much instantly, stats moves through the entire
width in a second or two, and clicking in scrollbars starts repeating
immediately.
I notice the following messages in my dmesg:
kvm: vcpu 3: requested 38141 ns lapic timer period limited to 200000 ns
kvm: vcpu 1: requested 9449 ns lapic timer period limited to 200000 ns
kvm: vcpu 2: requested 51204 ns lapic timer period limited to 200000 ns
kvm: vcpu 0: requested 74421 ns lapic timer period limited to 200000 ns
I don't remember this problem last time I ran 9front in qemu (around a
year ago), so I looked through commit history and found one commit
that seemed like it might be related:
http://git.9front.org/plan9front/plan9front/a05bab362f66ddd6fa65f2e7cda9eaaa0217ec08/commit.html
To check, I tried setting *nohpet=1 in my plan9.ini, which *seems* to
have solved the problem, though I still see those same messages in
dmesg on the linux host (but now the requested lapic timer period is
always around 100000 ns or greater).
Anyone know what's going on here?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9front] qemu clock/timer issue
2022-02-05 2:04 [9front] qemu clock/timer issue Michael Forney
@ 2022-02-05 17:04 ` hiro
2022-02-05 18:14 ` Nick Owens
1 sibling, 0 replies; 5+ messages in thread
From: hiro @ 2022-02-05 17:04 UTC (permalink / raw)
To: 9front
do you have timesync running twice? check in ps output.
On 2/5/22, Michael Forney <mforney@mforney.org> wrote:
> I was recently running 9front in a VM on linux (qemu, with
> -enable-kvm), and noticed several weird timing/clock issues. The date
> command gives wildly varying results +- 1 minute:
>
> cpu% date
> Fri Feb 4 17:39:23 PST 2022
> cpu% date
> Fri Feb 4 17:39:12 PST 2022
> cpu% date
> Fri Feb 4 17:40:19 PST 2022
> cpu% date
> Fri Feb 4 17:39:18 PST 2022
>
> `sleep 1` exits pretty much instantly, stats moves through the entire
> width in a second or two, and clicking in scrollbars starts repeating
> immediately.
>
> I notice the following messages in my dmesg:
> kvm: vcpu 3: requested 38141 ns lapic timer period limited to 200000 ns
> kvm: vcpu 1: requested 9449 ns lapic timer period limited to 200000 ns
> kvm: vcpu 2: requested 51204 ns lapic timer period limited to 200000 ns
> kvm: vcpu 0: requested 74421 ns lapic timer period limited to 200000 ns
>
> I don't remember this problem last time I ran 9front in qemu (around a
> year ago), so I looked through commit history and found one commit
> that seemed like it might be related:
> http://git.9front.org/plan9front/plan9front/a05bab362f66ddd6fa65f2e7cda9eaaa0217ec08/commit.html
>
> To check, I tried setting *nohpet=1 in my plan9.ini, which *seems* to
> have solved the problem, though I still see those same messages in
> dmesg on the linux host (but now the requested lapic timer period is
> always around 100000 ns or greater).
>
> Anyone know what's going on here?
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9front] qemu clock/timer issue
2022-02-05 2:04 [9front] qemu clock/timer issue Michael Forney
2022-02-05 17:04 ` hiro
@ 2022-02-05 18:14 ` Nick Owens
2022-02-18 3:25 ` Michael Forney
1 sibling, 1 reply; 5+ messages in thread
From: Nick Owens @ 2022-02-05 18:14 UTC (permalink / raw)
To: 9front
kvm emulates a local APIC timer which the 9front kernel uses. the
timer period is clamped to a minimum period, and my understanding is
that the purpose is to protect the host from CPU consumption by
guests, but i don't have a citation for that. 9front requests a timer
period shorter than what the default minimum is and kvm warns about
it. it maybe needs more investigation as to whether we really need
such short timers.
in the meantime, you can lower the timer period minimum with
echo "options kvm min_timer_period_us=10" >> /etc/modprobe.d/kvm.conf
which is 10000ns. it won't kill all of the warnings since sometimes
the timer period is slightly shorter, but it will help with the dmesg
spam.
On Sat, Feb 5, 2022 at 7:30 AM Michael Forney <mforney@mforney.org> wrote:
>
> I was recently running 9front in a VM on linux (qemu, with
> -enable-kvm), and noticed several weird timing/clock issues. The date
> command gives wildly varying results +- 1 minute:
>
> cpu% date
> Fri Feb 4 17:39:23 PST 2022
> cpu% date
> Fri Feb 4 17:39:12 PST 2022
> cpu% date
> Fri Feb 4 17:40:19 PST 2022
> cpu% date
> Fri Feb 4 17:39:18 PST 2022
>
> `sleep 1` exits pretty much instantly, stats moves through the entire
> width in a second or two, and clicking in scrollbars starts repeating
> immediately.
>
> I notice the following messages in my dmesg:
> kvm: vcpu 3: requested 38141 ns lapic timer period limited to 200000 ns
> kvm: vcpu 1: requested 9449 ns lapic timer period limited to 200000 ns
> kvm: vcpu 2: requested 51204 ns lapic timer period limited to 200000 ns
> kvm: vcpu 0: requested 74421 ns lapic timer period limited to 200000 ns
>
> I don't remember this problem last time I ran 9front in qemu (around a
> year ago), so I looked through commit history and found one commit
> that seemed like it might be related:
> http://git.9front.org/plan9front/plan9front/a05bab362f66ddd6fa65f2e7cda9eaaa0217ec08/commit.html
>
> To check, I tried setting *nohpet=1 in my plan9.ini, which *seems* to
> have solved the problem, though I still see those same messages in
> dmesg on the linux host (but now the requested lapic timer period is
> always around 100000 ns or greater).
>
> Anyone know what's going on here?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9front] qemu clock/timer issue
2022-02-05 18:14 ` Nick Owens
@ 2022-02-18 3:25 ` Michael Forney
2022-02-21 22:35 ` cinap_lenrek
0 siblings, 1 reply; 5+ messages in thread
From: Michael Forney @ 2022-02-18 3:25 UTC (permalink / raw)
To: 9front
On 2022-02-05, Nick Owens <mischief@offblast.org> wrote:
> kvm emulates a local APIC timer which the 9front kernel uses. the
> timer period is clamped to a minimum period, and my understanding is
> that the purpose is to protect the host from CPU consumption by
> guests, but i don't have a citation for that. 9front requests a timer
> period shorter than what the default minimum is and kvm warns about
> it. it maybe needs more investigation as to whether we really need
> such short timers.
>
> in the meantime, you can lower the timer period minimum with
>
> echo "options kvm min_timer_period_us=10" >> /etc/modprobe.d/kvm.conf
>
> which is 10000ns. it won't kill all of the warnings since sometimes
> the timer period is slightly shorter, but it will help with the dmesg
> spam.
Thanks for the info! I tried lowering to min_timer_period_us=5 and
even then it seems that 9front requests even shorter lapic timers:
kvm: vcpu 2: requested 828 ns lapic timer period limited to 5000 ns
kvm: vcpu 0: requested 828 ns lapic timer period limited to 5000 ns
kvm: vcpu 3: requested 3822 ns lapic timer period limited to 5000 ns
kvm: vcpu 1: requested 2526 ns lapic timer period limited to 5000 ns
I did some more debugging and I believe the issue is with CPU
frequency detection using hpetcpufreq(). I should have noticed this
earlier, but without *nohpet=1, 9front detects the CPUs as having a
very low clock rate, for example 27 MHz, when it should be 3.3 GHz.
Nothing in the code seemed wrong to me, so I added some debug prints
to try to figure out where detection was going wrong.
Here's a boot log with HPET, detecting the wrong frequency:
Plan 9
HPET: fed00000 8086a201 100 MHz
loops=1000
cycles 17384 [13196648 13214032]
hpet 502 [394 896]
loops=2000
cycles 42652 [56658413 56701065]
hpet 643 [1318776 1319419]
<snip>
loops=38000
cycles 477510 [2494452748 2494930258]
hpet 13729 [75232811 75246540]
loops=39000
cycles 491293 [2781515810 2782007103]
hpet 14149 [83936532 83950681]
loops=40000
cycles 498652 [2935179496 2935678148]
hpet 1986869 [88595551 90582420]
498652 cycles 1986869 hpet ticks
cpuhz=25097376 cpumhz=25 cyclefreq=25097376 loopconst=1930
125 holes free
0x00026000 0x0009f000 495616
0x00100000 0x00110000 65536
0x005a1000 0x7ffff000 2141577216
2142138368 bytes free
pcirouting: PCI.0.1.3 at pin 1 link 60 irq 9 -> 10
cpu0: 25MHz GenuineIntel P6 (AX 000306F2 CX F7FAB223 DX 1F8BFBFF)
LAPIC: fee00000 0xfffffe80fee00000
ELCR: 0C00
cpu0: lapic clock at 8MHz
cpu1: 25MHz GenuineIntel P6 (AX 000306F2 CX F7FAB223 DX 1F8BFBFF)
cpu2: 25MHz GenuineIntel P6 (AX 000306F2 CX F7FAB223 DX 1F8BFBFF)
cpu3: 25MHz GenuineIntel P6 (AX 000306F2 CX F7FAB223 DX 1F8BFBFF)
It seems that the hpetcpufreq loop is seeing a huge jump in hpet ticks
at some point, causing the loop to stop prematurely. Looking through
qemu source, I believe the HPET is emulated with it's
"QEMU_CLOCK_VIRTUAL", which uses the host's
clock_gettime(CLOCK_MONOTONIC), taking care to save offsets so it
doesn't increase in between vm_stop and vm_start. I confirmed that
vm_stop is never called while reproducing the bug, which means that
reading the HPET counter is effectively doing
clock_gettime(CLOCK_MONOTONIC)/10 (100 MHz, so 10 ns period).
As I was debugging, I noticed that it doesn't happen all of the time,
and I can usually trigger it by resizing the qemu window. When I run
with -display none, I do not see the issue.
I am not quite sure how TSC works under VT-x (and my searches weren't
too helpful), but if is only counting cycles of the virtual CPUs,
while HPET is counting real time on the host, that could explain this
result. Perhaps when the qemu window is resized, the virtual CPU stops
running for some period, but the HPET keeps ticking. Then, when it is
resumed, 9front sees a huge jump in HPET ticks, with a comparatively
small number of clock cycles, and deduces from this that the CPU clock
speed is very slow.
This leaves me with a few questions:
- Is there a more robust method for determining CPU clock rate?
- Why does i8253cpufreq() work with qemu? As far as I can tell, they
are both implemented in qemu using the same clock, but it seems that
it is immune to the resize trick.
Maybe this is something I need to ask qemu developers.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9front] qemu clock/timer issue
2022-02-18 3:25 ` Michael Forney
@ 2022-02-21 22:35 ` cinap_lenrek
0 siblings, 0 replies; 5+ messages in thread
From: cinap_lenrek @ 2022-02-21 22:35 UTC (permalink / raw)
To: 9front
Nice writeup. Thank you for this investigation.
> Maybe this is something I need to ask qemu developers.
Yes, please try.
Maybe we are missing some instruction barriers when
reading the tsc counter?
The issue is hard to reproduce for me, i have no issues
with qemu or real hardware. I guess it also depends on
the host cpu and linux scheduling and the moon phase :(
Anyway, dont give up. It would be very nice to get to
the bottom of this.
--
cinap
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-02-21 22:40 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-05 2:04 [9front] qemu clock/timer issue Michael Forney
2022-02-05 17:04 ` hiro
2022-02-05 18:14 ` Nick Owens
2022-02-18 3:25 ` Michael Forney
2022-02-21 22:35 ` cinap_lenrek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).