From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id q0TKGjC9024613 for ; Sun, 29 Jan 2012 21:16:45 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AugCAJ+oJU/U4xEIk2dsb2JhbABDhQupTSIBAQEBCQkLCRQDIoFyAQEEASNWBQsLGgImAgJXBhMJh3MDBqZxkGeBL4cOAQkHAQsJAQIDBDCEIgQBBAgCBAMOAgEGFIIVgRYEjSyNRIx8 X-IronPort-AV: E=Sophos;i="4.71,588,1320620400"; d="scan'208";a="141940187" Received: from moutng.kundenserver.de ([212.227.17.8]) by mail1-smtp-roc.national.inria.fr with ESMTP; 29 Jan 2012 21:16:21 +0100 Received: from office1.lan.sumadev.de (dslb-188-097-002-117.pools.arcor-ip.net [188.97.2.117]) by mrelayeu.kundenserver.de (node=mreu0) with ESMTP (Nemesis) id 0M6RTj-1SpDAd3Y1j-00y11L; Sun, 29 Jan 2012 21:16:21 +0100 Received: from [192.168.0.31] (dslb-084-058-002-228.pools.arcor-ip.net [84.58.2.228]) by office1.lan.sumadev.de (Postfix) with ESMTPSA id 308CFC0714; Sun, 29 Jan 2012 21:16:20 +0100 (CET) From: Gerd Stolpmann To: Diego Olivier Fernandez Pons Cc: caml-list In-Reply-To: References: <1327835747.19516.23.camel@samsung> Content-Type: text/plain; charset="UTF-8" Date: Sun, 29 Jan 2012 21:16:17 +0100 Message-ID: <1327868178.19516.60.camel@samsung> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:Az48WPB9u4QLJOcGoMMsdIeNh0Mh+Aj91cuB1EMt69d moA8ylDiDuotd33McQcAk7/xzmI2Vtmh6pWfU1fAC4wyP2mzq8 HhZXtd2eqtS2TtZCoUi+Tz++urgBQ12nfSaShOaFG4kMvohlRG uE+JpSz44CJgIzdvPQqGYNOVFXCvbcctslBv0B6IuK4el+5orT 5y5tHWfgv/m6zqIJQZZip3nhshMBGHAjM/rOmFEXuEcXOAF6yl ZWj1+gxIlsBAcaOkleANaPy6Fw3QUQVNBdhBjt9AKnVYM1KCoN gpVbNsJgevf+HW+G5NPknEabtq9zXNFrwVjVru9x/TO4xcL5vx ZYM9iSNrzv8N2nuYMSs8EJpcZrzjKuNuIGE7+M2tu Subject: Re: [Caml-list] SQL engine in OCaml with client side cache Am Sonntag, den 29.01.2012, 19:29 +0100 schrieb Diego Olivier Fernandez Pons: > Caml-list, > > [Gerd Stolpmann wrote] > > I think this is not possible. SQL always needs access to the complete > > table for executing queries (including the complete indexes). "Not possible" in the sense of: yielding a result with a performance that is better than without cache. Also I was assuming the general case. > I am surprised by your comments. Many systems have two-layer data > storage (massive but slow one, fast but limited one), and that doesn't > affect their semantics. The problem is to define locality here. Imagine you have a table with two indexes, and the indexes are pretty independent of each other in terms of locality (i.e. when y is the direct successor of x in one index, the distance in the other index is large). How do you order the rows of the table? For efficient caching it is essential that you can guess which other data items will be accessed next. If you have multi-dimensional locality as in this example, though, it will be difficult to guess. The SQL server "solves" this by deciding on a query strategy beforehand. E.g. for an inner join involving two indexes it is usually possible to iterate over one index, and then do lookups over the other one. This is slow enough on a local disk - if you add more latency for the lookups on the second index (because they are not in the cache), performance will even be worse. Note that there are also join algorithms that really need complete tables, like sort-and-merge. > Consider a compiler (heap/ registers), an > operating system (swap files, RAM) or even a typical SQL database that > keeps tables in memory for fast access. This works on page level only (i.e. databases either rely on the page cache of the OS, or implement their own page cache following a similar design). So, it is pretty low-level. If you try to have a cache on a higher level, you run into the problem how to get the data via SQL statements. The following simple strategy does not work: If the statement cannot be fully responded from the cache, just send it to the server, and put the response rows into the cache. Because: You don't know which rows have been omitted. You would not be able to respond range queries from the cache. So, to have a working cache you need quite direct access to the data, e.g. like "retrieve all data from this position to that position", both for tables and indexes. There are SQL cursors, but, so far I know, they are not precise enough for this. I think this is probably solvable if you know the data definition. > This is the same but the hard drive is on the server side, the RAM on > the client side and they communicate by Internet. > > I am not saying that implementing a generic cache / swapping system > for an SQL database is easy. I think if you solved the generic case, you'd be a candidate for the Turing award. (Just as I side note: Missing scalability has always been a problem for SQL databases. Currently, many companies go away from this technology for their giant data warehouses just because of this. Of course, these are not read-only, but read-write, but even having read-only scalability would be a big achievement.) > That's why I added that for this specific > application I can easily compute a superset of the data the user will > need. Lets take an example : say my database contains sales history > data for a convenience store chain for all stores in France, all > products for last 5 years. If my user is the Coca-Cola replenishment > manager in Paris, he only needs sales of Coca-Cola products for the > last 5 similar days for each store in Paris. Thereafter I can generate > a query on the server that sends that superset of data to the client > and let the SQL client engine work on that. This approach could in deed work. Basically, you define the locality directly. > I may have underestimated the amount of work to be done, because I > thought this would be on the easier side. > My idea was the following > > Server side > - 3rd part SQL database > - OCaml bindings > - Web communication interface > > Client side > - SQL parser written in OCaml + interpreter > - control system to guess superset of data and request from server > - Web communication interface > - output to JavaScript widget toolkit > > I was expecting to find most of the pieces already done and only > having to glue them. I was actually more afraid of the JavaScript part > than the SQL one... You can even find SQL engines written in > JavaScript out there, to run in a web browser like Jade > http://jacwright.com/489/in-memory-javascript-database/ or JOrder > (JSON) https://github.com/danstocker/jorder > > At the end that's nothing but arrays and for loops. And an ugly parser :-( If you can define the superset in a way so that it only depends on the client but not on the client's queries, an option would be to use an existing in-memory SQL database as cache. Sqlite has such a mode (use ":memory:" as filename), and AFAIK MySql, too. Gerd -- ------------------------------------------------------------ Gerd Stolpmann, Darmstadt, Germany gerd@gerd-stolpmann.de Creator of GODI and camlcity.org. Contact details: http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de ------------------------------------------------------------