* [9front] qemu clock/timer issue @ 2022-02-05 2:04 Michael Forney 2022-02-05 17:04 ` hiro 2022-02-05 18:14 ` Nick Owens 0 siblings, 2 replies; 5+ messages in thread From: Michael Forney @ 2022-02-05 2:04 UTC (permalink / raw) To: 9front I was recently running 9front in a VM on linux (qemu, with -enable-kvm), and noticed several weird timing/clock issues. The date command gives wildly varying results +- 1 minute: cpu% date Fri Feb 4 17:39:23 PST 2022 cpu% date Fri Feb 4 17:39:12 PST 2022 cpu% date Fri Feb 4 17:40:19 PST 2022 cpu% date Fri Feb 4 17:39:18 PST 2022 `sleep 1` exits pretty much instantly, stats moves through the entire width in a second or two, and clicking in scrollbars starts repeating immediately. I notice the following messages in my dmesg: kvm: vcpu 3: requested 38141 ns lapic timer period limited to 200000 ns kvm: vcpu 1: requested 9449 ns lapic timer period limited to 200000 ns kvm: vcpu 2: requested 51204 ns lapic timer period limited to 200000 ns kvm: vcpu 0: requested 74421 ns lapic timer period limited to 200000 ns I don't remember this problem last time I ran 9front in qemu (around a year ago), so I looked through commit history and found one commit that seemed like it might be related: http://git.9front.org/plan9front/plan9front/a05bab362f66ddd6fa65f2e7cda9eaaa0217ec08/commit.html To check, I tried setting *nohpet=1 in my plan9.ini, which *seems* to have solved the problem, though I still see those same messages in dmesg on the linux host (but now the requested lapic timer period is always around 100000 ns or greater). Anyone know what's going on here? ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9front] qemu clock/timer issue 2022-02-05 2:04 [9front] qemu clock/timer issue Michael Forney @ 2022-02-05 17:04 ` hiro 2022-02-05 18:14 ` Nick Owens 1 sibling, 0 replies; 5+ messages in thread From: hiro @ 2022-02-05 17:04 UTC (permalink / raw) To: 9front do you have timesync running twice? check in ps output. On 2/5/22, Michael Forney <mforney@mforney.org> wrote: > I was recently running 9front in a VM on linux (qemu, with > -enable-kvm), and noticed several weird timing/clock issues. The date > command gives wildly varying results +- 1 minute: > > cpu% date > Fri Feb 4 17:39:23 PST 2022 > cpu% date > Fri Feb 4 17:39:12 PST 2022 > cpu% date > Fri Feb 4 17:40:19 PST 2022 > cpu% date > Fri Feb 4 17:39:18 PST 2022 > > `sleep 1` exits pretty much instantly, stats moves through the entire > width in a second or two, and clicking in scrollbars starts repeating > immediately. > > I notice the following messages in my dmesg: > kvm: vcpu 3: requested 38141 ns lapic timer period limited to 200000 ns > kvm: vcpu 1: requested 9449 ns lapic timer period limited to 200000 ns > kvm: vcpu 2: requested 51204 ns lapic timer period limited to 200000 ns > kvm: vcpu 0: requested 74421 ns lapic timer period limited to 200000 ns > > I don't remember this problem last time I ran 9front in qemu (around a > year ago), so I looked through commit history and found one commit > that seemed like it might be related: > http://git.9front.org/plan9front/plan9front/a05bab362f66ddd6fa65f2e7cda9eaaa0217ec08/commit.html > > To check, I tried setting *nohpet=1 in my plan9.ini, which *seems* to > have solved the problem, though I still see those same messages in > dmesg on the linux host (but now the requested lapic timer period is > always around 100000 ns or greater). > > Anyone know what's going on here? > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9front] qemu clock/timer issue 2022-02-05 2:04 [9front] qemu clock/timer issue Michael Forney 2022-02-05 17:04 ` hiro @ 2022-02-05 18:14 ` Nick Owens 2022-02-18 3:25 ` Michael Forney 1 sibling, 1 reply; 5+ messages in thread From: Nick Owens @ 2022-02-05 18:14 UTC (permalink / raw) To: 9front kvm emulates a local APIC timer which the 9front kernel uses. the timer period is clamped to a minimum period, and my understanding is that the purpose is to protect the host from CPU consumption by guests, but i don't have a citation for that. 9front requests a timer period shorter than what the default minimum is and kvm warns about it. it maybe needs more investigation as to whether we really need such short timers. in the meantime, you can lower the timer period minimum with echo "options kvm min_timer_period_us=10" >> /etc/modprobe.d/kvm.conf which is 10000ns. it won't kill all of the warnings since sometimes the timer period is slightly shorter, but it will help with the dmesg spam. On Sat, Feb 5, 2022 at 7:30 AM Michael Forney <mforney@mforney.org> wrote: > > I was recently running 9front in a VM on linux (qemu, with > -enable-kvm), and noticed several weird timing/clock issues. The date > command gives wildly varying results +- 1 minute: > > cpu% date > Fri Feb 4 17:39:23 PST 2022 > cpu% date > Fri Feb 4 17:39:12 PST 2022 > cpu% date > Fri Feb 4 17:40:19 PST 2022 > cpu% date > Fri Feb 4 17:39:18 PST 2022 > > `sleep 1` exits pretty much instantly, stats moves through the entire > width in a second or two, and clicking in scrollbars starts repeating > immediately. > > I notice the following messages in my dmesg: > kvm: vcpu 3: requested 38141 ns lapic timer period limited to 200000 ns > kvm: vcpu 1: requested 9449 ns lapic timer period limited to 200000 ns > kvm: vcpu 2: requested 51204 ns lapic timer period limited to 200000 ns > kvm: vcpu 0: requested 74421 ns lapic timer period limited to 200000 ns > > I don't remember this problem last time I ran 9front in qemu (around a > year ago), so I looked through commit history and found one commit > that seemed like it might be related: > http://git.9front.org/plan9front/plan9front/a05bab362f66ddd6fa65f2e7cda9eaaa0217ec08/commit.html > > To check, I tried setting *nohpet=1 in my plan9.ini, which *seems* to > have solved the problem, though I still see those same messages in > dmesg on the linux host (but now the requested lapic timer period is > always around 100000 ns or greater). > > Anyone know what's going on here? ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9front] qemu clock/timer issue 2022-02-05 18:14 ` Nick Owens @ 2022-02-18 3:25 ` Michael Forney 2022-02-21 22:35 ` cinap_lenrek 0 siblings, 1 reply; 5+ messages in thread From: Michael Forney @ 2022-02-18 3:25 UTC (permalink / raw) To: 9front On 2022-02-05, Nick Owens <mischief@offblast.org> wrote: > kvm emulates a local APIC timer which the 9front kernel uses. the > timer period is clamped to a minimum period, and my understanding is > that the purpose is to protect the host from CPU consumption by > guests, but i don't have a citation for that. 9front requests a timer > period shorter than what the default minimum is and kvm warns about > it. it maybe needs more investigation as to whether we really need > such short timers. > > in the meantime, you can lower the timer period minimum with > > echo "options kvm min_timer_period_us=10" >> /etc/modprobe.d/kvm.conf > > which is 10000ns. it won't kill all of the warnings since sometimes > the timer period is slightly shorter, but it will help with the dmesg > spam. Thanks for the info! I tried lowering to min_timer_period_us=5 and even then it seems that 9front requests even shorter lapic timers: kvm: vcpu 2: requested 828 ns lapic timer period limited to 5000 ns kvm: vcpu 0: requested 828 ns lapic timer period limited to 5000 ns kvm: vcpu 3: requested 3822 ns lapic timer period limited to 5000 ns kvm: vcpu 1: requested 2526 ns lapic timer period limited to 5000 ns I did some more debugging and I believe the issue is with CPU frequency detection using hpetcpufreq(). I should have noticed this earlier, but without *nohpet=1, 9front detects the CPUs as having a very low clock rate, for example 27 MHz, when it should be 3.3 GHz. Nothing in the code seemed wrong to me, so I added some debug prints to try to figure out where detection was going wrong. Here's a boot log with HPET, detecting the wrong frequency: Plan 9 HPET: fed00000 8086a201 100 MHz loops=1000 cycles 17384 [13196648 13214032] hpet 502 [394 896] loops=2000 cycles 42652 [56658413 56701065] hpet 643 [1318776 1319419] <snip> loops=38000 cycles 477510 [2494452748 2494930258] hpet 13729 [75232811 75246540] loops=39000 cycles 491293 [2781515810 2782007103] hpet 14149 [83936532 83950681] loops=40000 cycles 498652 [2935179496 2935678148] hpet 1986869 [88595551 90582420] 498652 cycles 1986869 hpet ticks cpuhz=25097376 cpumhz=25 cyclefreq=25097376 loopconst=1930 125 holes free 0x00026000 0x0009f000 495616 0x00100000 0x00110000 65536 0x005a1000 0x7ffff000 2141577216 2142138368 bytes free pcirouting: PCI.0.1.3 at pin 1 link 60 irq 9 -> 10 cpu0: 25MHz GenuineIntel P6 (AX 000306F2 CX F7FAB223 DX 1F8BFBFF) LAPIC: fee00000 0xfffffe80fee00000 ELCR: 0C00 cpu0: lapic clock at 8MHz cpu1: 25MHz GenuineIntel P6 (AX 000306F2 CX F7FAB223 DX 1F8BFBFF) cpu2: 25MHz GenuineIntel P6 (AX 000306F2 CX F7FAB223 DX 1F8BFBFF) cpu3: 25MHz GenuineIntel P6 (AX 000306F2 CX F7FAB223 DX 1F8BFBFF) It seems that the hpetcpufreq loop is seeing a huge jump in hpet ticks at some point, causing the loop to stop prematurely. Looking through qemu source, I believe the HPET is emulated with it's "QEMU_CLOCK_VIRTUAL", which uses the host's clock_gettime(CLOCK_MONOTONIC), taking care to save offsets so it doesn't increase in between vm_stop and vm_start. I confirmed that vm_stop is never called while reproducing the bug, which means that reading the HPET counter is effectively doing clock_gettime(CLOCK_MONOTONIC)/10 (100 MHz, so 10 ns period). As I was debugging, I noticed that it doesn't happen all of the time, and I can usually trigger it by resizing the qemu window. When I run with -display none, I do not see the issue. I am not quite sure how TSC works under VT-x (and my searches weren't too helpful), but if is only counting cycles of the virtual CPUs, while HPET is counting real time on the host, that could explain this result. Perhaps when the qemu window is resized, the virtual CPU stops running for some period, but the HPET keeps ticking. Then, when it is resumed, 9front sees a huge jump in HPET ticks, with a comparatively small number of clock cycles, and deduces from this that the CPU clock speed is very slow. This leaves me with a few questions: - Is there a more robust method for determining CPU clock rate? - Why does i8253cpufreq() work with qemu? As far as I can tell, they are both implemented in qemu using the same clock, but it seems that it is immune to the resize trick. Maybe this is something I need to ask qemu developers. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9front] qemu clock/timer issue 2022-02-18 3:25 ` Michael Forney @ 2022-02-21 22:35 ` cinap_lenrek 0 siblings, 0 replies; 5+ messages in thread From: cinap_lenrek @ 2022-02-21 22:35 UTC (permalink / raw) To: 9front Nice writeup. Thank you for this investigation. > Maybe this is something I need to ask qemu developers. Yes, please try. Maybe we are missing some instruction barriers when reading the tsc counter? The issue is hard to reproduce for me, i have no issues with qemu or real hardware. I guess it also depends on the host cpu and linux scheduling and the moon phase :( Anyway, dont give up. It would be very nice to get to the bottom of this. -- cinap ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-02-21 22:40 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-02-05 2:04 [9front] qemu clock/timer issue Michael Forney 2022-02-05 17:04 ` hiro 2022-02-05 18:14 ` Nick Owens 2022-02-18 3:25 ` Michael Forney 2022-02-21 22:35 ` cinap_lenrek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).