The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] Archeology: AberMUD, BCPL, ec.
@ 2019-01-30 19:51 Richard Salz
  2019-01-31  7:15 ` Alec Muffett
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Salz @ 2019-01-30 19:51 UTC (permalink / raw)
  To: tuhs

[-- Attachment #1: Type: text/plain, Size: 251 bytes --]

Some folks are trying to figure out how to get AberMud source online and
working; see https://twitter.com/larsbrinkhoff/status/1056823314272960512

Sample code at
https://raw.githubusercontent.com/larsbrinkhoff/abermud/master/abermud1/text/timelock.b

[-- Attachment #2: Type: text/html, Size: 581 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [TUHS] Archeology: AberMUD, BCPL, ec.
  2019-01-30 19:51 [TUHS] Archeology: AberMUD, BCPL, ec Richard Salz
@ 2019-01-31  7:15 ` Alec Muffett
  2019-01-31  7:28   ` Lars Brinkhoff
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Alec Muffett @ 2019-01-31  7:15 UTC (permalink / raw)
  To: Richard Salz; +Cc: tuhs

[-- Attachment #1: Type: text/plain, Size: 491 bytes --]

Has anyone ever attempted to OCR a video, perhaps by breaking into frames
and then aggregating the results, using multiple frames to correct each
other?

On Wed, 30 Jan 2019, 19:51 Richard Salz <rich.salz@gmail.com wrote:

> Some folks are trying to figure out how to get AberMud source online and
> working; see https://twitter.com/larsbrinkhoff/status/1056823314272960512
>
> Sample code at
> https://raw.githubusercontent.com/larsbrinkhoff/abermud/master/abermud1/text/timelock.b
>
>
>
>

[-- Attachment #2: Type: text/html, Size: 1124 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [TUHS] Archeology: AberMUD, BCPL, ec.
  2019-01-31  7:15 ` Alec Muffett
@ 2019-01-31  7:28   ` Lars Brinkhoff
  2019-01-31 14:42   ` Clem Cole
  2019-02-01  5:08   ` Jason Stevens
  2 siblings, 0 replies; 10+ messages in thread
From: Lars Brinkhoff @ 2019-01-31  7:28 UTC (permalink / raw)
  To: Alec Muffett; +Cc: tuhs

Alec Muffett wrote:
> Has anyone ever attempted to OCR a video, perhaps by breaking into
> frames and then aggregating the results, using multiple frames to
> correct each other?

I had in mind to just manually pick a good frame for each page and type
it in by hand.  I have tried to OCR program listings before, with rather
poor results.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [TUHS] Archeology: AberMUD, BCPL, ec.
  2019-01-31  7:15 ` Alec Muffett
  2019-01-31  7:28   ` Lars Brinkhoff
@ 2019-01-31 14:42   ` Clem Cole
  2019-01-31 19:34     ` Lawrence Stewart
  2019-02-01  5:08   ` Jason Stevens
  2 siblings, 1 reply; 10+ messages in thread
From: Clem Cole @ 2019-01-31 14:42 UTC (permalink / raw)
  To: Alec Muffett; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 1126 bytes --]

I'm not sure if the old DEC CRL tech reports are still around.   At one
time before the Compaq-tion, some folks at CRL and the folks at Boston
Public Library and WGBH were working with video and trying to extract all
sorts of text from it.   I do not remember how successful they were, but
there might be some hints in their tech reports.  I'll ask around and see
if I can turn anything up.  Part of the problem I have is I that don't
remember who was doing that work, but some of my friends might.

Clem
ᐧ

On Thu, Jan 31, 2019 at 2:16 AM Alec Muffett <alec.muffett@gmail.com> wrote:

> Has anyone ever attempted to OCR a video, perhaps by breaking into frames
> and then aggregating the results, using multiple frames to correct each
> other?
>
> On Wed, 30 Jan 2019, 19:51 Richard Salz <rich.salz@gmail.com wrote:
>
>> Some folks are trying to figure out how to get AberMud source online and
>> working; see https://twitter.com/larsbrinkhoff/status/1056823314272960512
>>
>> Sample code at
>> https://raw.githubusercontent.com/larsbrinkhoff/abermud/master/abermud1/text/timelock.b
>>
>>
>>
>>

[-- Attachment #2: Type: text/html, Size: 2586 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [TUHS] Archeology: AberMUD, BCPL, ec.
  2019-01-31 14:42   ` Clem Cole
@ 2019-01-31 19:34     ` Lawrence Stewart
  2019-01-31 19:45       ` Lawrence Stewart
  0 siblings, 1 reply; 10+ messages in thread
From: Lawrence Stewart @ 2019-01-31 19:34 UTC (permalink / raw)
  To: Clem Cole; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 2407 bytes --]

I was at CRL from 1989 to 1994.  I sent an inquiry to our informal mailing list.

We had written an audio server along the lines of the X server (http://www.hpl.hp.com/techreports/Compaq-DEC/CRL-93-8.pdf) and Tom Levergood wrote an application called Store24 to keep a rolling 24 history of WBUR (local NPR station).  We thought about using speech recognition to build a searchable index for it.

The next idea was to do the same thing for Video, perhaps using the closed captioning feed to develop the index.  Dave Wecker (now at Microsoft Research) reports working on extracting data from NPR news streams and it would find the appropriate audio or video clip.  He’s not sure he published that.

Jim Gettys cites http://www.hpl.hp.com/techreports/Compaq-DEC/CRL-99-2.pdf (Indexing Multimedia for the Internet) and notes that all the DEC techreports are hidden away at http://www.hpl.hp.com/techreports/ <http://www.hpl.hp.com/techreports/>. Choose “Browse by year” and select Compaq/DEC

-Larry

> On 2019, Jan 31, at 9:42 AM, Clem Cole <clemc@ccc.com> wrote:
> 
> I'm not sure if the old DEC CRL tech reports are still around.   At one time before the Compaq-tion, some folks at CRL and the folks at Boston Public Library and WGBH were working with video and trying to extract all sorts of text from it.   I do not remember how successful they were, but there might be some hints in their tech reports.  I'll ask around and see if I can turn anything up.  Part of the problem I have is I that don't remember who was doing that work, but some of my friends might.
> 
> Clem
> ᐧ
> 
> On Thu, Jan 31, 2019 at 2:16 AM Alec Muffett <alec.muffett@gmail.com <mailto:alec.muffett@gmail.com>> wrote:
> Has anyone ever attempted to OCR a video, perhaps by breaking into frames and then aggregating the results, using multiple frames to correct each other?
> 
> On Wed, 30 Jan 2019, 19:51 Richard Salz <rich.salz@gmail.com <mailto:rich.salz@gmail.com> wrote:
> Some folks are trying to figure out how to get AberMud source online and working; see https://twitter.com/larsbrinkhoff/status/1056823314272960512 <https://twitter.com/larsbrinkhoff/status/1056823314272960512>
> 
> Sample code at https://raw.githubusercontent.com/larsbrinkhoff/abermud/master/abermud1/text/timelock.b <https://raw.githubusercontent.com/larsbrinkhoff/abermud/master/abermud1/text/timelock.b>
> 
> 
> 


[-- Attachment #2: Type: text/html, Size: 4790 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [TUHS] Archeology: AberMUD, BCPL, ec.
  2019-01-31 19:34     ` Lawrence Stewart
@ 2019-01-31 19:45       ` Lawrence Stewart
  0 siblings, 0 replies; 10+ messages in thread
From: Lawrence Stewart @ 2019-01-31 19:45 UTC (permalink / raw)
  To: Clem Cole; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 4001 bytes --]

A followup from TV Raman, now at Google:

> We also did an intern project -- Tom's intern who became my intern after
> Tom left (Arjen De Vries) where we did:
> 
> 1. Converted the caption stream into an sgml document indexed by time --
> so the caption stream came down in dribs and drabs of the form "turn
> background yellow, foreground white, place this text"... that turned
> into the  SGML document, with each element tagged with time.
> 
> 2. We then indexed that collection of SGML documents --   the content
> stream was Tom's ring-buffer of  the CNN live feed (6 hours was what we
> stored from memory)
> 3. We then built a simple-minded search engine over the SGML documents,
> used the CRL reco engine for getting user queries -- you could also just
> type the query at a search box; did the search over the
> caption-doc-index, found the time-stamp and played the video.
> 
> Arjen may have published some of this as his final year Masters project
> out of the University Of Twente -- likely summer 1995.
> -- 
> Id: kg:/m/0285kf1

I searched for Arjen De Vries and found

https://pdfs.semanticscholar.org/fb10/b792fb209e0d347cd14430fbb446c1b178f3.pdf
“Radio and Television Information Filtering through Speech Recognition”
which in turn cites his Master’s thesis from 1995.



> On 2019, Jan 31, at 2:34 PM, Lawrence Stewart <stewart@serissa.com> wrote:
> 
> I was at CRL from 1989 to 1994.  I sent an inquiry to our informal mailing list.
> 
> We had written an audio server along the lines of the X server (http://www.hpl.hp.com/techreports/Compaq-DEC/CRL-93-8.pdf <http://www.hpl.hp.com/techreports/Compaq-DEC/CRL-93-8.pdf>) and Tom Levergood wrote an application called Store24 to keep a rolling 24 history of WBUR (local NPR station).  We thought about using speech recognition to build a searchable index for it.
> 
> The next idea was to do the same thing for Video, perhaps using the closed captioning feed to develop the index.  Dave Wecker (now at Microsoft Research) reports working on extracting data from NPR news streams and it would find the appropriate audio or video clip.  He’s not sure he published that.
> 
> Jim Gettys cites http://www.hpl.hp.com/techreports/Compaq-DEC/CRL-99-2.pdf <http://www.hpl.hp.com/techreports/Compaq-DEC/CRL-99-2.pdf> (Indexing Multimedia for the Internet) and notes that all the DEC techreports are hidden away at http://www.hpl.hp.com/techreports/ <http://www.hpl.hp.com/techreports/>. Choose “Browse by year” and select Compaq/DEC
> 
> -Larry
> 
>> On 2019, Jan 31, at 9:42 AM, Clem Cole <clemc@ccc.com <mailto:clemc@ccc.com>> wrote:
>> 
>> I'm not sure if the old DEC CRL tech reports are still around.   At one time before the Compaq-tion, some folks at CRL and the folks at Boston Public Library and WGBH were working with video and trying to extract all sorts of text from it.   I do not remember how successful they were, but there might be some hints in their tech reports.  I'll ask around and see if I can turn anything up.  Part of the problem I have is I that don't remember who was doing that work, but some of my friends might.
>> 
>> Clem
>> ᐧ
>> 
>> On Thu, Jan 31, 2019 at 2:16 AM Alec Muffett <alec.muffett@gmail.com <mailto:alec.muffett@gmail.com>> wrote:
>> Has anyone ever attempted to OCR a video, perhaps by breaking into frames and then aggregating the results, using multiple frames to correct each other?
>> 
>> On Wed, 30 Jan 2019, 19:51 Richard Salz <rich.salz@gmail.com <mailto:rich.salz@gmail.com> wrote:
>> Some folks are trying to figure out how to get AberMud source online and working; see https://twitter.com/larsbrinkhoff/status/1056823314272960512 <https://twitter.com/larsbrinkhoff/status/1056823314272960512>
>> 
>> Sample code at https://raw.githubusercontent.com/larsbrinkhoff/abermud/master/abermud1/text/timelock.b <https://raw.githubusercontent.com/larsbrinkhoff/abermud/master/abermud1/text/timelock.b>
>> 
>> 
>> 
> 


[-- Attachment #2: Type: text/html, Size: 7135 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [TUHS] Archeology: AberMUD, BCPL, ec.
  2019-01-31  7:15 ` Alec Muffett
  2019-01-31  7:28   ` Lars Brinkhoff
  2019-01-31 14:42   ` Clem Cole
@ 2019-02-01  5:08   ` Jason Stevens
  2019-02-01  8:09     ` Steve Nickolas
  2 siblings, 1 reply; 10+ messages in thread
From: Jason Stevens @ 2019-02-01  5:08 UTC (permalink / raw)
  To: Richard Salz, Alec Muffett; +Cc: tuhs

[-- Attachment #1: Type: text/plain, Size: 833 bytes --]

Ffmpeg can extract images from a video file.  I used imagemagik to do a CGA palettized version of a video and ffmpeg to stitch it all back together. 




I can get the flags.. 




Get Outlook for Android







On Thu, Jan 31, 2019 at 3:16 PM +0800, "Alec Muffett" <alec.muffett@gmail.com> wrote:










Has anyone ever attempted to OCR a video, perhaps by breaking into frames and then aggregating the results, using multiple frames to correct each other?

On Wed, 30 Jan 2019, 19:51 Richard Salz <rich.salz@gmail.com wrote:
Some folks are trying to figure out how to get AberMud source online and working; see https://twitter.com/larsbrinkhoff/status/1056823314272960512
Sample code at https://raw.githubusercontent.com/larsbrinkhoff/abermud/master/abermud1/text/timelock.b










[-- Attachment #2: Type: text/html, Size: 2345 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [TUHS] Archeology: AberMUD, BCPL, ec.
  2019-02-01  5:08   ` Jason Stevens
@ 2019-02-01  8:09     ` Steve Nickolas
  0 siblings, 0 replies; 10+ messages in thread
From: Steve Nickolas @ 2019-02-01  8:09 UTC (permalink / raw)
  To: Jason Stevens; +Cc: tuhs

[-- Attachment #1: Type: TEXT/PLAIN, Size: 478 bytes --]

On Fri, 1 Feb 2019, Jason Stevens wrote:

> Ffmpeg can extract images from a video file.  I used imagemagik to do a CGA palettized version of a video and ffmpeg to stitch it all back together.
>
> I can get the flags..

I can't remember it all, but I want to say you start with "ffmpeg -i 
filename.mp4 flnm%04d.png"

I usually use Avisynth on Windows together with ffmpeg to do the opposite, 
because it has ImageSource(), but I suppose ffmpeg can do it by itself 
too.

-uso.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [TUHS] Archeology: AberMUD, BCPL, ec.
  2019-02-01  4:47 Doug McIlroy
@ 2019-02-01 14:41 ` Nemo
  0 siblings, 0 replies; 10+ messages in thread
From: Nemo @ 2019-02-01 14:41 UTC (permalink / raw)
  To: Doug McIlroy; +Cc: tuhs

On 31/01/2019, Doug McIlroy <doug@cs.dartmouth.edu> wrote:
> I OCR'd a sizable manuscript written on a pretty shabby portable
> typewriter.
>
> I scanned each page twice, making sure to move the paper between scans.
> Then I ran both diff (by words, not lines) and spell to smoke out trouble.
> The word list for a program listing is quite short and easy to generate.
> (Print a list of all the apparent words and visually eliminate the
> nonsense.)
> And a spell check is an easy pipeline of standard utilities.
>
> doug

Very nice!  (I shall remember this technique.)

N.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [TUHS] Archeology: AberMUD, BCPL, ec.
@ 2019-02-01  4:47 Doug McIlroy
  2019-02-01 14:41 ` Nemo
  0 siblings, 1 reply; 10+ messages in thread
From: Doug McIlroy @ 2019-02-01  4:47 UTC (permalink / raw)
  To: tuhs


> I have tried to OCR program listings before, with rather
> poor results.

I OCR'd a sizable manuscript written on a pretty shabby portable typewriter.

I scanned each page twice, making sure to move the paper between scans.
Then I ran both diff (by words, not lines) and spell to smoke out trouble.
The word list for a program listing is quite short and easy to generate.
(Print a list of all the apparent words and visually eliminate the nonsense.)
And a spell check is an easy pipeline of standard utilities.

doug

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-02-01 14:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-30 19:51 [TUHS] Archeology: AberMUD, BCPL, ec Richard Salz
2019-01-31  7:15 ` Alec Muffett
2019-01-31  7:28   ` Lars Brinkhoff
2019-01-31 14:42   ` Clem Cole
2019-01-31 19:34     ` Lawrence Stewart
2019-01-31 19:45       ` Lawrence Stewart
2019-02-01  5:08   ` Jason Stevens
2019-02-01  8:09     ` Steve Nickolas
2019-02-01  4:47 Doug McIlroy
2019-02-01 14:41 ` Nemo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).