9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] No regression tests
@ 2014-03-25  5:50 Adriano Verardo
  2014-03-25 12:33 ` erik quanstrom
  0 siblings, 1 reply; 9+ messages in thread
From: Adriano Verardo @ 2014-03-25  5:50 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

A few weeks ago i wrote about an unkillable manager of usb barcode readers.
That code  worked perfectly for 5+ years, with absolutely no changes.

IMHO the problem seems to be a change in Bell kernel sources, as under 9Atom
all works as expected.

Unfortunately I can't say what is the last working release, because the
problem
has been noted for the first time some weeks ago, but the kernel is
rebuilt frequently
and the sources are upgraded, non regularly, 3/4 times in a year.

adriano



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] No regression tests
  2014-03-25  5:50 [9fans] No regression tests Adriano Verardo
@ 2014-03-25 12:33 ` erik quanstrom
  2014-03-25 23:11   ` Adriano Verardo
  0 siblings, 1 reply; 9+ messages in thread
From: erik quanstrom @ 2014-03-25 12:33 UTC (permalink / raw)
  To: 9fans

On Tue Mar 25 01:51:36 EDT 2014, adriano.verardo@mail.com wrote:
> A few weeks ago i wrote about an unkillable manager of usb barcode
> readers.  That code worked perfectly for 5+ years, with absolutely no
> changes.
>
> IMHO the problem seems to be a change in Bell kernel sources, as under
> 9Atom all works as expected.
>
> Unfortunately I can't say what is the last working release, because
> the problem has been noted for the first time some weeks ago, but the
> kernel is rebuilt frequently and the sources are upgraded, non
> regularly, 3/4 times in a year.

that's interesting.  what state are these processes in what are
the backtraces?

- erik



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] No regression tests
  2014-03-25 12:33 ` erik quanstrom
@ 2014-03-25 23:11   ` Adriano Verardo
  2014-03-26 14:46     ` erik quanstrom
  0 siblings, 1 reply; 9+ messages in thread
From: Adriano Verardo @ 2014-03-25 23:11 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

erik quanstrom ha scritto:
> On Tue Mar 25 01:51:36 EDT 2014, adriano.verardo@mail.com wrote:
>> A few weeks ago i wrote about an unkillable manager of usb barcode
>> readers.  That code worked perfectly for 5+ years, with absolutely no
>> changes.
>>
>> IMHO the problem seems to be a change in Bell kernel sources, as under
>> 9Atom all works as expected.
>>
>> Unfortunately I can't say what is the last working release, because
>> the problem has been noted for the first time some weeks ago, but the
>> kernel is rebuilt frequently and the sources are upgraded, non
>> regularly, 3/4 times in a year.
> that's interesting.  what state are these processes in what are
> the backtraces?
The task is basically a customized keyboard manager which
open a channel in /srv. When running ps shows 4 instances, as it
is started by usbd and forks 3 times.

Unplugging the reader all four processes must (should) terminate.
On Bell, since a while ago, only three die. Then, when plugging in again
there is a spurious process which doesn't allow the other (new 4) to work.

Kill nor slay works, the only solution is a reboot.

Internal debug prints (#ifdef, no code changes) show exactly the same
under Bell and Atom. In both cases, when unplugging, the manager
notify the condition, notify it terminates but under Bell this doesn't
actually happen.

I regret not to have more detailed info. I suspect there is something
changed in the detach primitives or so. But its only a very personal
opinion.

adriano





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] No regression tests
  2014-03-25 23:11   ` Adriano Verardo
@ 2014-03-26 14:46     ` erik quanstrom
  2014-03-26 18:26       ` Adriano Verardo
  0 siblings, 1 reply; 9+ messages in thread
From: erik quanstrom @ 2014-03-26 14:46 UTC (permalink / raw)
  To: 9fans

> I regret not to have more detailed info. I suspect there is something
> changed in the detach primitives or so. But its only a very personal
> opinion.

hmm.  would it be too much to ask to request a ps of the processes that
failed to exit?  i really would just like to know what state they're in.
i think this may have been a latent bug that just came out.

- erik



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] No regression tests
  2014-03-26 14:46     ` erik quanstrom
@ 2014-03-26 18:26       ` Adriano Verardo
  2014-03-26 19:13         ` erik quanstrom
  0 siblings, 1 reply; 9+ messages in thread
From: Adriano Verardo @ 2014-03-26 18:26 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

erik quanstrom ha scritto:
>> I regret not to have more detailed info. I suspect there is something
>> changed in the detach primitives or so. But its only a very personal
>> opinion.
> hmm.  would it be too much to ask to request a ps of the processes that
> failed to exit?  i really would just like to know what state they're in.
> i think this may have been a latent bug that just came out.
>
> - erik
Working on a Bell I've at home, downloaded a few weeks ago.
The kernel is built using the same config used on the field,
where the wrong behaviour has been noted.
The modified usbd and thebcscan process are embedded.

After booting with the reader plugged in (normal condition):

bootes           12    0:00   0:00      336K Pread    bcscan
bootes           13    0:00   0:00      336K Rendez bcscan
bootes           14    0:00   0:00      336K Rendez bcscan
bootes           19    0:00   0:00      336K Pread bcscan

Here mount /srv/bcscan /n/bc gives a readable /n/bc/bcU0/data.

Then the reader is unplugged

bootes           12    0:00   0:00      336K Pread bcscan
bootes           13    0:00   0:00      336K Rendez bcscan
bootes           14    0:00   0:00      336K Rendez bcscan

Plaese note that here we see a different case. There are three
spurious processes. On the plant (same test) there is only one.

Then the reader is plugged in again

bootes           13    0:00   0:00      336K Rendez bcscan
bootes           14    0:00   0:00      336K Rendez bcscan
bootes          432    0:00   0:00      336K Rendez bcscan
bootes          434    0:00   0:00      336K Pread bcscan
bootes          435    0:00   0:00      336K Rendez bcscan
bootes          436    0:00   0:00      336K Rendez bcscan
bootes          437    0:00   0:00      336K Open bcscan


Here mount /srv/bcscan /n/bc gives an empty /n/bc but doesn't
complain.

adriano
>




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] No regression tests
  2014-03-26 18:26       ` Adriano Verardo
@ 2014-03-26 19:13         ` erik quanstrom
  2014-03-26 19:45           ` Adriano Verardo
  0 siblings, 1 reply; 9+ messages in thread
From: erik quanstrom @ 2014-03-26 19:13 UTC (permalink / raw)
  To: 9fans

> Here mount /srv/bcscan /n/bc gives a readable /n/bc/bcU0/data.
>
> Then the reader is unplugged
>
> bootes           12    0:00   0:00      336K Pread bcscan
> bootes           13    0:00   0:00      336K Rendez bcscan
> bootes           14    0:00   0:00      336K Rendez bcscan
>
> Plaese note that here we see a different case. There are three
> spurious processes. On the plant (same test) there is only one.
>
> Then the reader is plugged in again
>
> bootes           13    0:00   0:00      336K Rendez bcscan
> bootes           14    0:00   0:00      336K Rendez bcscan
> bootes          432    0:00   0:00      336K Rendez bcscan
> bootes          434    0:00   0:00      336K Pread bcscan
> bootes          435    0:00   0:00      336K Rendez bcscan
> bootes          436    0:00   0:00      336K Rendez bcscan
> bootes          437    0:00   0:00      336K Open bcscan

i should learn chess so i don't ask questions in serial.

with acid, you can get a backtrace of process 12 and get the fd
it is reading.  /proc/12/fd should have the file descriptor bcscan
thinks is open.  if it

also, since process 13 and 14 did not wake from rendezvous,
there is a second issue.  maybe you can see how 12 could exit
and leave 13 and 14 hanging.

- erik



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] No regression tests
  2014-03-26 19:13         ` erik quanstrom
@ 2014-03-26 19:45           ` Adriano Verardo
  2014-03-26 19:48             ` erik quanstrom
  0 siblings, 1 reply; 9+ messages in thread
From: Adriano Verardo @ 2014-03-26 19:45 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> i should learn chess so i don't ask questions in serial.
Sorry, I don't understand the meaning of this sentence.
The word by word translation, in italian, has no logical meaning.
>
> with acid, you can get a backtrace of process 12 and get the fd
> it is reading.  /proc/12/fd should have the file descriptor bcscan
> thinks is open.  if it
>
> also, since process 13 and 14 did not wake from rendezvous,
> there is a second issue.  maybe you can see how 12 could exit
> and leave 13 and 14 hanging.
>
>
I'll try, even if I don't know acid very well.

What is the backtrace of a process. lstk() ?

adriano




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] No regression tests
  2014-03-26 19:45           ` Adriano Verardo
@ 2014-03-26 19:48             ` erik quanstrom
  0 siblings, 0 replies; 9+ messages in thread
From: erik quanstrom @ 2014-03-26 19:48 UTC (permalink / raw)
  To: 9fans

> I'll try, even if I don't know acid very well.
>
> What is the backtrace of a process. lstk() ?

lstk() gives the details (including locals) stk() is a basic
backtrace.  see acid(1) or /sys/doc/acid.ps

- erik



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] No regression tests
       [not found] <214392e810a42ad8a4958929ca150ed5@proxima.alt.za>
@ 2014-03-25 21:38 ` Adriano Verardo
  0 siblings, 0 replies; 9+ messages in thread
From: Adriano Verardo @ 2014-03-25 21:38 UTC (permalink / raw)
  To: 9fan >> Fans of the OS Plan 9 from Bell Labs

lucio@proxima.alt.za ha scritto:
>> but the kernel is
>> rebuilt frequently
>> and the sources are upgraded, non regularly, 3/4 times in a year.
>
> You could bisect the kernel from the history and try to locate the
> change that way.  There have been recent changes to USB, so that's
> where you should look first.
Yes, but perhaps i would do a diff among Bell and Atom usb sources first.
Unless they weren't organized so differently to be not comparable at
all, of course.
Anyway, I observe that the last Atom release works and the last Bell one
do not.

I have to maintain industrial systems in service. From my personal point
of view
this usb problem is a neglectabe flaw, as devices must stay  always
firmly plugged.
But the customer thinks different and I must solve asap. I'll install
Atom instead of
Bell

adriano




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-03-26 19:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-25  5:50 [9fans] No regression tests Adriano Verardo
2014-03-25 12:33 ` erik quanstrom
2014-03-25 23:11   ` Adriano Verardo
2014-03-26 14:46     ` erik quanstrom
2014-03-26 18:26       ` Adriano Verardo
2014-03-26 19:13         ` erik quanstrom
2014-03-26 19:45           ` Adriano Verardo
2014-03-26 19:48             ` erik quanstrom
     [not found] <214392e810a42ad8a4958929ca150ed5@proxima.alt.za>
2014-03-25 21:38 ` Adriano Verardo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).