From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <027001c28cfe$b509b7d0$6501a8c0@KIKE> From: "matt" To: <9fans@cse.psu.edu> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_026D_01C28CFE.B3DD45C0" Subject: [9fans] Google File System Date: Fri, 15 Nov 2002 23:28:28 +0000 Topicbox-Message-UUID: 2193acca-eacb-11e9-9e20-41e7f4b1d025 This is a multi-part message in MIME format. ------=_NextPart_000_026D_01C28CFE.B3DD45C0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Money and mouth in [im]perfect harmony attached and http://proweb.net/~matt/p9-4/goofs.c [I had a nettiquette conundrum here, when does a post get too big to attach?] Here's the fs I mentioned the other day. It started as a look into the Google API which uses SOAP http://www.google.com/apis/ My eyes started to bleed reading that documentation so I wrote one that parses the HTML returned from a GET I've not used it that much but it did save my life already. We lost some data from the database [don't ask] and I used goofs to retrieve the relevant pages from google's cache of my site and was able to re-create the data. I'm pretty sure it does stuff wrong but it seems to work. to do a search echo the term into the ctl file you can even add extra options such as site restriction (you can get quite complicated if you try) It will break when google changes it's html output example % goofs -m /usr/matt/gofs % echo 'factotum site:www.cs.bell-labs.com' > /usr/matt/gofs/ctl % ls /usr/matt/gofs/factotum+site:www.cs.bell-labs.com /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1 /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/2 .. /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/19 [goofs tries to get 50 results] % ls '/usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1' /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1/cached /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1/description /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1/folder /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1/related /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1/summary /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1/title /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1/url % cat /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1/summary Factotum and SecStore
... A process called factotum is used to hold credentials like passwords
and public/private keypairs and perform cryptographic operations. ...
------=_NextPart_000_026D_01C28CFE.B3DD45C0 Content-Type: application/octet-stream; name="goofs.c" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="goofs.c" /* =0A= =0A= 8c goofs.c && 8l goofs.8 && mv 8.out goofs=0A= =0A= This is my first user level file server.=0A= It does a google search and presents files in the mountpoint =0A= based on the html returned by google=0A= to do a search echo the term into the ctl file=0A= you can even add extra options such as site restriction=0A= (you can get quite complicated if you try)=0A= =0A= It will break when google changes it's html output=0A= =0A= =0A= example=0A= % goofs -m /usr/matt/gofs=0A= % echo 'factotum site:www.cs.bell-labs.com' > /usr/matt/gofs/ctl =0A= =0A= % ls /usr/matt/gofs/factotum+site:www.cs.bell-labs.com=0A= /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1=0A= /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/2=0A= ..=0A= /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/19 [goofs tries to = get 50 results]=0A= =0A= % ls '/usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1' =0A= =0A= /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1/cached=0A= /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1/description=0A= /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1/folder=0A= /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1/related=0A= /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1/summary=0A= /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1/title=0A= /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1/url=0A= =0A= % cat /usr/matt/gofs/factotum+site:www.cs.bell-labs.com/1/summary=0A= =0A= Factotum and SecStore
... A = process called factotum is used to hold credentials like = passwords
=0A= and public/private keypairs and perform cryptographic operations. = ... =0A=
=0A= =0A= =0A= */=0A= =0A= #include =0A= #include =0A= #include =0A= #include =0A= #include =0A= #include =0A= #include <9p.h>=0A= =0A= =0A= Tree *tree=3Dnil;=0A= File *ctrlfile=3Dnil;=0A= =0A= enum {=0A= id_Ctl=3D1,=0A= id_Qr=3D2,=0A= };=0A= =0A= typedef struct Aux Aux;=0A= struct Aux {=0A= int id_code;=0A= int index;=0A= Aux *next;=0A= char *data;=0A= int datasize;=0A= };=0A= =0A= typedef struct QResult QResult;=0A= struct QResult {=0A= char *search;=0A= char *url;=0A= char *title;=0A= char *summary;=0A= char *cached;=0A= char *description;=0A= char *related;=0A= char *folder;=0A= };=0A= =0A= int strn_tr(char *subject, int count, char c_before, char c_after) {=0A= int i, k=3D0;=0A= if (subject && count > 0) =0A= for(i=3D0; iid_code =3D id_code;=0A= new_file =3D createfile(dir, filename, getuser(), perm, = (void*)new_aux->index);=0A= set_file_data(new_file, data, datasize);=0A= incref(dir);=0A= }=0A= =0A= decref(dir);=0A= =0A= return new_file;=0A= }=0A= =0A= File *=0A= get_or_create_new_file(File *dir, char* filename, int perm, int id_code, = char *data, int datasize) {=0A= File *new_file=3Dnil;=0A= Aux *new_aux;=0A= =0A= if (dir =3D=3D nil || filename =3D=3D nil) return nil;=0A= incref(dir);=0A= incref(dir); /* so walk doesn't kill it immediately on failure */=0A= =0A= if (! (new_file =3D walkfile(dir, filename)) ) {=0A= new_aux =3D get_aux(0);=0A= new_aux->id_code =3D id_code;=0A= new_file =3D createfile(dir, filename, getuser(), perm, = (void*)new_aux->index);=0A= set_file_data(new_file, data, datasize);=0A= =0A= }=0A= =0A= decref(dir);=0A= =0A= return new_file;=0A= }=0A= =0A= Aux *=0A= get_aux(int index) {=0A= Aux *a;=0A= if(index) {=0A= for(a=3Daux_list; a && a->index!=3Dindex ; a=3Da->next) ;=0A= } else {=0A= a =3D emalloc9p(sizeof(Aux));=0A= a->id_code =3D 0;=0A= a->data =3D nil;=0A= a->datasize =3D 0;=0A= =0A= if(aux_list) {=0A= a->index =3D aux_list->index + 1;=0A= a->next =3D aux_list;=0A= } else {=0A= a->index =3D 1;=0A= a->next =3D nil;=0A= }=0A= =0A= aux_list =3D a;=0A= }=0A= =0A= return a;=0A= };=0A= =0A= void=0A= free_aux(Aux *a) {=0A= if (!a) return;=0A= free(a->data);=0A= }=0A= =0A= int=0A= set_aux_data(Aux *a, char * data, int datasize) {=0A= =0A= if(a) free(a->data);=0A= =0A= if (data) {=0A= datasize =3D datasize + 1;=0A= a->data =3D emalloc9p(datasize);=0A= a->datasize =3D datasize;=0A= memcpy(a->data, data, datasize-1);=0A= a->data[datasize-1] =3D 0;=0A= } else {=0A= a->data =3D nil;=0A= a->datasize =3D 0;=0A= }=0A= =0A= return a->datasize;=0A= }=0A= =0A= char *=0A= set_file_data(File *f, char *data, int datasize) {=0A= Aux *a=3Dnil;=0A= if (f && (a =3D get_aux((int)f->aux))) {=0A= f->length =3D set_aux_data(a, data, datasize);=0A= f->aux =3D (void*)a->index;=0A= return a->data; =0A= }=0A= return nil;=0A= }=0A= =0A= void=0A= fsopen(Req *r) { =0A= int i;=0A= i =3D (int)r->fid->file->aux;=0A= respond(r, nil);=0A= }=0A= =0A= char *=0A= str_append(char *target, char *source) {=0A= int new_size=3D0;=0A= if (source) {=0A= if (target) {=0A= new_size =3D strlen(target) + strlen(source) + 1;=0A= target =3D erealloc9p(target, new_size );=0A= target =3D strcat(target, source);=0A= } else {=0A= target =3D estrdup9p(source);=0A= }=0A= }=0A= return target;=0A= }=0A= =0A= QResult *=0A= new_query_result(char *search) {=0A= QResult *qr =3D emalloc9p(sizeof(QResult));=0A= qr->search =3D estrdup9p(search);=0A= qr->url =3D mallocz(1, 1); =0A= qr->summary =3D mallocz(1, 1); =0A= qr->cached =3D mallocz(1, 1);=0A= qr->description =3D mallocz(1, 1);=0A= qr->related=3Dmallocz(1, 1);=0A= qr->folder=3Dmallocz(1,1);=0A= qr->title =3D mallocz(1,1);=0A= return qr;=0A= }=0A= =0A= QResult *=0A= fill_query_result(QResult *qr, Biobuf *bio_in) {=0A= char *start_c=3Dnil;=0A= int siz;=0A= char block;=0A= int end_block=3D0;=0A= int chop=3D0;=0A= char token;=0A= char *s;=0A= =0A= if(!qr) return nil;=0A= =0A= start_c =3D Brdstr(bio_in, '=3D', 0);=0A= if(start_c) {=0A= free(start_c); // just consume it=0A= start_c =3D nil;=0A= }=0A= =0A= chop=3D1;=0A= =0A= qr->url =3D str_append(qr->url, Brdstr(bio_in, '>', chop));=0A= s =3D strchr(qr->url, '/');=0A= if(s) qr->related =3D smprint("q=3Drelated:%s", s + 1);=0A= token =3D '>';=0A= block =3D 's';=0A= chop =3D0;=0A= while(token && (start_c=3DBrdstr(bio_in, token, chop))) {=0A= siz =3D Blinelen(bio_in) ;=0A= =0A= if(siz > 28) {=0A= if ((block =3D=3D 'f') && strcmp(start_c + (siz - 29), " 19) {=0A= if (strcmp(start_c + (siz-20), "") =3D=3D 0) {=0A= start_c[siz-20] =3D 0;=0A= switch(block) {=0A= case 's' :=0A= qr->summary =3D str_append(qr->summary, start_c);=0A= break;=0A= case 'd' :=0A= qr->description =3D str_append(qr->description, start_c);=0A= break;=0A= case 'f' :=0A= qr->folder =3D str_append(qr->folder, start_c);=0A= break;=0A= }=0A= end_block =3D 1;=0A= block =3D '>';=0A= token =3D ':';=0A= chop=3D0;=0A= goto next_block;=0A= }=0A= }=0A= =0A= if (siz > 15) {=0A= if (strcmp(start_c + (siz-16), "/search?q=3Dcache:") =3D=3D 0) {=0A= qr->cached =3D str_append(qr->cached, "http:");=0A= qr->cached =3D str_append(qr->cached, start_c);=0A= block =3D 'c';=0A= token =3D '>';=0A= chop=3D1;=0A= =0A= end_block=3D0;=0A= goto next_block;=0A= }=0A= } =0A= if (siz > 13) {=0A= if (strcmp(start_c + (siz-14), "") =3D=3D 0) {=0A= end_block=3D0;=0A= token =3D ':';=0A= chop=3D0;=0A= block =3D '>';=0A= goto next_block;=0A= }=0A= }=0A= =0A= if (siz > 11) {=0A= if (strcmp(start_c + (siz-12), "Description:") =3D=3D 0) {=0A= block =3D'd';=0A= end_block=3D0;=0A= token =3D '>';=0A= chop=3D0;=0A= goto next_block;=0A= }=0A= }=0A= if (siz > 8) {=0A= if (strcmp(start_c + (siz-9), "Category:") =3D=3D 0) {=0A= block =3D'f';=0A= end_block=3D0;=0A= token =3D ':';=0A= chop=3D0;=0A= goto next_block;=0A= }=0A= }=0A= if (siz > 7) {=0A= if (strcmp("", start_c +(siz-8)) =3D=3D 0) {=0A= token =3D 0;=0A= goto next_block; =0A= }=0A= }=0A= =0A= if (strcmp("", start_c) =3D=3D 0)=0A= if (block =3D=3D 'd' ) goto next_block;=0A= if (strcmp(" ", start_c) =3D=3D 0)=0A= if (block =3D=3D 'f' ) goto next_block;=0A= if (strcmp("", start_c) =3D=3D 0)=0A= if (block =3D=3D 'd') goto next_block;=0A= if(strcmp("", start_c) =3D=3D 0)=0A= if (block =3D=3D 'd') goto next_block;=0A= if(strcmp("/", start_c) =3D=3D 0)=0A= if (block =3D=3D'f') {=0A= token =3D '>';=0A= chop =3D 1;=0A= goto next_block;=0A= }=0A= =0A= if(!end_block) {=0A= switch (block) {=0A= case 's' :=0A= qr->summary =3D str_append(qr->summary, start_c);=0A= break;=0A= case 'c' :=0A= qr->cached =3D str_append(qr->cached, start_c);=0A= end_block=3D1;=0A= chop =3D0;=0A= break;=0A= case 'd':=0A= qr->description =3D str_append(qr->description, start_c);=0A= break;=0A= case 'f':=0A= qr->folder =3D str_append(qr->folder, start_c);=0A= end_block =3D 1;=0A= token =3D '>';=0A= chop =3D 0;=0A= break;=0A= case '>' :=0A= // skip=0A= break;=0A= }=0A= goto next_block;=0A= }=0A= next_block :=0A= free(start_c);=0A= start_c =3D nil;=0A= }=0A= =0A= free(start_c);=0A= =0A= return qr;=0A= } =0A= =0A= =0A= char*=0A= url_encode(char *string) {=0A= return string;=0A= }=0A= void=0A= free_qresult(QResult *qr) {=0A= if (qr =3D=3D nil) return;=0A= =0A= free(qr->url);=0A= free(qr->summary);=0A= free(qr->search);=0A= free(qr->cached);=0A= free(qr->description);=0A= free(qr->related);=0A= free(qr->folder);=0A= free(qr->title);=0A= =0A= }=0A= =0A= File *=0A= create_fs(File *qr_root, QResult *qr, int qnum) {=0A= File *qr_dir;=0A= char *qnumtxt;=0A= if(qr_root =3D=3D nil || qr =3D=3D nil) return;=0A= =0A= qnumtxt =3D smprint("%d", qnum);=0A= if (qr_dir=3Dcreate_new_file(qr_root, qnumtxt, DMDIR|0777, 0, nil, 0)) {=0A= create_new_file(qr_dir, "url", 0444, id_Qr, qr->url, strlen(qr->url));=0A= create_new_file(qr_dir, "title", 0444, id_Qr, qr->title, = strlen(qr->title));=0A= create_new_file(qr_dir, "summary", 0444, id_Qr, qr->summary, = strlen(qr->summary));=0A= create_new_file(qr_dir, "cached", 0444, id_Qr, qr->cached, = strlen(qr->cached));=0A= create_new_file(qr_dir, "description", 0444, id_Qr, qr->description, = strlen(qr->description));=0A= create_new_file(qr_dir, "related", 0444, id_Qr, qr->related, = strlen(qr->related));=0A= create_new_file(qr_dir, "folder", 0444, id_Qr, qr->folder, = strlen(qr->folder));=0A= =0A= }=0A= free(qnumtxt);=0A= return nil;=0A= }=0A= =0A= char *=0A= process_query(char *search_item, int start_q, int num_q) {=0A= int gfd;=0A= int not_finished;=0A= int num_results;=0A= Biobuf *gbio_in;=0A= char *html =3D nil;=0A= char *post=3Dnil;=0A= int siz;=0A= QResult *qr;=0A= File *qr_root;=0A= =0A= if (!search_item) {=0A= return nil;=0A= }=0A= =0A= if ( (gfd =3D dial("tcp!www.google.co.uk!80", 0, 0, 0)) > 1) {=0A= gbio_in =3D emalloc9p(sizeof(Biobuf));=0A= Binit(gbio_in, gfd, OREAD);=0A= post =3D smprint("GET = http://www.google.com/search?q=3D%s&sourceid=3Dmozilla-search&start=3D%d&= num=3D%d&as_oq=3D&as_eq=3D&lr=3D&as_ft=3Di&as_filetype=3D&as_qdr=3Dall&as= _occt=3Dany&as_dt=3Di&safe=3Dimages HTTP/1.0\nUser-Agent: Mozilla\5.0 = (Windows; U; Windows NT 5.1; en-US; rv:1.0.0) Gecko\20020530\n\n", = url_encode(search_item), start_q, num_q); =0A= fprint(gfd, post);=0A= free(post);=0A= not_finished =3D 1;=0A= num_results =3D 0;=0A= =0A= if(! (qr_root =3D get_or_create_new_file(tree->root, search_item, = DMDIR|0777, 0, nil, 0))) return nil;=0A= =0A= while(not_finished && (html =3D Brdstr(gbio_in, '>', 0))) {=0A= siz =3D Blinelen(gbio_in);=0A= =0A= if (siz > 7) {=0A= if ( strcmp("", html + siz-8) =3D=3D 0) {=0A= num_results++;=0A= qr =3D new_query_result(search_item) ;=0A= if (fill_query_result(qr, gbio_in)); =0A= create_fs(qr_root, qr, ++start_q);=0A= free_qresult(qr);=0A= =0A= } else {=0A= not_finished =3D strcmp("", html + siz-8);=0A= }=0A= =0A= }=0A= =0A= free(html);=0A= }=0A= }=0A= return nil;=0A= }=0A= =0A= char *=0A= process_ctl_message(Aux *a) {=0A= char *reply =3D nil;=0A= char *query=3Dnil;=0A= char *lastchar;=0A= char *firstchar;=0A= if(!(a && a->data)) return nil;=0A= =0A= if (a->data) {=0A= query =3D estrdup9p(a->data);=0A= str_tr(query, '\n', ' ');=0A= for(firstchar=3Dquery; *firstchar=3D=3D ' '; firstchar++);=0A= for(lastchar =3D query +strlen(query)-1; *lastchar =3D=3D ' '; = lastchar--) *lastchar =3D 0;=0A= str_tr(firstchar, ' ', '+');=0A= reply =3D process_query(firstchar, 0, 50);=0A= free(query);=0A= }=0A= =0A= return reply;=0A= }=0A= =0A= void=0A= fswrite(Req *r) {=0A= int index=3D (int)r->fid->file->aux;=0A= Aux *a;=0A= char * errstr =3D nil;=0A= =0A= if(index) {=0A= a =3D get_aux(index);=0A= if (a && set_aux_data(a, r->ifcall.data, r->ifcall.count) ) {=0A= r->ofcall.offset =3D a->datasize;=0A= if (a->id_code =3D=3D id_Ctl)=0A= errstr =3D process_ctl_message(a);=0A= }=0A= }=0A= respond(r, errstr);=0A= free(errstr);=0A= }=0A= =0A= =0A= void=0A= fsread(Req *r)=0A= {=0A= int index;=0A= Aux *a=3Dnil;=0A= index=3D (int)r->fid->file->aux;=0A= =0A= a =3D get_aux(index);=0A= =0A= if (a && a->data && r->ifcall.offset < a->datasize) {=0A= r->ofcall.data =3D a->data + r->ifcall.offset;=0A= r->ofcall.count =3D a->datasize - r->ifcall.offset ;=0A= } else {=0A= r->ofcall.data =3D nil;=0A= r->ofcall.count =3D 0;=0A= }=0A= respond(r, nil);=0A= }=0A= =0A= void=0A= fsend (Srv *) {=0A= Aux *a;=0A= while (aux_list) {=0A= a =3D aux_list->next;=0A= free_aux(aux_list);=0A= aux_list =3D a;=0A= }=0A= }=0A= =0A= Srv fs =3D =0A= {=0A= .open=3D fsopen,=0A= =0A= =0A= static void=0A= usage(void)=0A= {=0A= fprint(2, "usage: goofs [-m mtpt] \n");=0A= exits("usage");=0A= }=0A= =0A= =0A= =0A= void=0A= main(int argc, char **argv)=0A= {=0A= Aux *new_aux; =0A= char *mtpt;=0A= =0A= mtpt =3D "/mnt/goofs";=0A= =0A= ARGBEGIN{=0A= case 'm':=0A= mtpt =3D ARGF();=0A= break;=0A= }ARGEND;=0A= if(argc !=3D 0)=0A= usage();=0A= =0A= tree =3D fs.tree =3D alloctree(getuser(), getuser(), DMDIR|0555, nil);=0A= ctrlfile =3D create_new_file(tree->root, "ctl", 0666, id_Ctl, nil, 0) ;=0A= =0A= postmountsrv(&fs, nil, mtpt, MREPL);=0A= =0A= exits(nil);=0A= =0A= }=0A= =0A= ------=_NextPart_000_026D_01C28CFE.B3DD45C0--