From: brz-systemd-dev@intma.in
To: 9fans@9fans.net
Subject: Re: [9fans] text database Kirara
Date: Tue, 6 Aug 2013 14:14:07 -0400 [thread overview]
Message-ID: <7fb94c8c89d208107f12d7e69d02f2d6@neinchan.znet> (raw)
In-Reply-To: <5E38D9FC-75B6-40C9-AB64-E210DACE0B4E@ar.aichi-u.ac.jp>
[-- Attachment #1: Type: text/plain, Size: 622 bytes --]
I've played around with Kirara for a couple hours, now, and am pretty
surprised at how simple it is. It's already become integrated into my
workflow. Being able to quickly (and easily) search for relevant
snippets of code throughout the system is quite useful.
I feel compelled to mention that the code is abnormally high in
quality. (This is seen, even in the rc scripts)
Now I'm going to have to look through your other projects.
Thanks for releasing this.
- BurnZeZ
Bug:
kirara-1.1/INSTALL:9: mkdir -p $kirarar/bin/^(rc $objtype)
Here (and on line 11), '$kirarar' is used instead of '$kirara'.
[-- Attachment #2: Type: message/rfc822, Size: 7706 bytes --]
From: arisawa <arisawa@ar.aichi-u.ac.jp>
To: 9fans@9fans.net
Subject: [9fans] text database Kirara
Date: Tue, 6 Aug 2013 10:14:36 +0900
Message-ID: <5E38D9FC-75B6-40C9-AB64-E210DACE0B4E@ar.aichi-u.ac.jp>
Hello 9fans,
I have written a text database named Kirara.
The following is a brief introduction to Kirara.
If you are interested in, get Kirara from:
http://plan9.aichi-u.ac.jp/netlib/kirara/
Kenji Arisawa
-------------
Kirara
-------------
Kirara is a text indexing/retrieval tool for Plan 9.
Personal use: index/retrieve local files.
Kirara is based on the idea similar to Glimpse.
(1) indexing + grep
(2) multi-level indexing
(a) small space for indexing
(b) small update time
(c) quick search
Note that:
small indexing <-> quick search
Kirara makes more index -> quick search
Glimpse is single-level indexing.
-------------
Query
Kirara does not support phrase search.
The database is index of words,
supporting:
QE mode (query expression mode)
'&', '|', '*'
The example:
'snoopy&html'
'snoop*&htm*'
RE mode (regular expression mode)
'&', RE
where RE denotes regular expression.
The example:
'sn.*y&h.+l'
RE mode is a bit slow. (a few second.)
-------------
Words
Two or more runes.
All words are converted to lower case.
In English, words is composed of alphabets.
The number of runes is configurable
Assumption:
Text is composed of space-separated words
popular in English and many European Languages,
but not in Japanese.
-------------
The user's interface
Best match with Rio
term% kfind snoop
G snoop /sys/src/9/ip/
G snoop /sys/src/cmd/spell/
G snoop /sys/src/9/kw/
...
term% G snoop /sys/src/9/ip
devip.c:34: Qsnoop,
devip.c:95: case Qsnoop:
devip.c:98: devdir(c, q, "snoop", qlen(cv->sq), cv->owner, 0400, dp);
...
Note that: two steps
1. find directories
2. find files and the contents
Step 2 is actually 'grep'. we can use RE.
Two-steps search is not a weekness, but a desirable feature.
Because we have so many files that are hit by the query.
-------------
The organization
My example
/n/other/kirara/sysdb
target: (/lib /sys/lib /sys/src /sys/man /sys/include /sys/doc /rc)
/n/other/kirara/usrdb
target: $home/^(bin/rc lib netlib doc adm issues srclib src sources)
Indexing target is fully configurable.
-------------
Multi-Level Indexing
(1) Indexing (top level)
word to directory mapping
sysdb/index # main index # used for RE mode
sysdb/mindex # meta index (alphabetic index) # used for QE mode
sysdb/dind/* # rough index of each directory
sysdb/QTDir # map table (QID, mtime, path-to-dir)
index # word to dir QID
aa 0000000000014f0a
aa 000000000001a1e0
aa 000000000001a26e
mindex # word to range in index
aa 0 126669
ab 126669 491569
ac 491569 1258566
ad 1258566 1852467
...
dind/* # `*' is a directory QID
0000000000014f05
0000000000014f0a
000000000001a1ce
usrdb is same.
(2) Indexing (directory level) # optional
word to file mapping
sysdb/find/*/ind.gz # fine index of the directory (gzipped)
sysdb/find/*/qtn # map table (QID, mtime, name)
where `*' is a directory QID
usrdb is same as sysdb.
-------------
Experiment
(a) hardware
GA-H61M-USB3-B3
Intel Pentium G860 (3GHz)
DDR3 PC3 4GB
(b) software
9front
cwfs64x
-------------
The performance (compression ratio)
target target num_of_dirs indexing
sysdb: 556 MB 1790 dirs 49 MB
usrdb: 6620 MB 8948 dirs 150 MB
compression ratio: 49/556 (sysdb)
note: usrdb includes many non-text file.
-------------
The performance (retrieval time)
system dependent
RQ search # kfind foo
0.1 seconds.
It is not important to make this time smaller.
(sufficiently small)
RE search # kfind -r foo
a few seconds
-------------
The performance (construction/update)
(a) Construction time
system dependent
Initial construction
need
10 minutes for sysdb
30 minutes for usrdb
(b) Updating time
two commands for update
mkdb
20 seconds to a few minutes for usrdb
depends largely on state of cache
mkdb1 (currently only for usrdb)
5 to 15 seconds for usrdb
mkdb1 needs event log
-------------
Scalability
Main factors
(a) retrieval time
QE search: proportional to number of dirs that include the query
RE search: proportional to size of index
(b) initial construction time
proportional to total data
(c) update time
mkdb: proportional to number of dirs and the changes
mkdb1: proportional to changes and size of index
-------------
Used Tools
(1) rc
(2) grep, sed, awk, sort, diff, gzip, ...
(3) some new tools written in C
-------------
What Kirara means?
Kirara is name of a girl that appeared in a Japanese comic book.
(But I have never read the book.)
The name is seldom used in real world.
From the name we Japanese imagine something glittering.
I like the name.
-------------
References
[1] GLIMPSE: A Tool to Search Through Entire File Systems
Udi Manber and Sun Wu (1993)
http://webglimpse.net/pubs/glimpse.pdf
[2] Glimpse Documentation
http://webglimpse.net/gdocs/glimpsehelp.html
next prev parent reply other threads:[~2013-08-06 18:14 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-06 1:14 arisawa
2013-08-06 7:26 ` Francisco J Ballesteros
2013-08-06 8:12 ` Peter A. Cejchan
2013-08-06 18:14 ` brz-systemd-dev [this message]
2013-08-07 6:31 ` arisawa
2013-08-07 7:22 ` Skip Tavakkolian
2013-08-07 8:17 ` arisawa
2013-08-07 13:32 ` erik quanstrom
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7fb94c8c89d208107f12d7e69d02f2d6@neinchan.znet \
--to=brz-systemd-dev@intma.in \
--cc=9fans@9fans.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).