* How to process simple HTML files with LuaTeX
@ 2007-09-13 13:04 Mojca Miklavec
2007-09-13 22:22 ` Hans Hagen
0 siblings, 1 reply; 11+ messages in thread
From: Mojca Miklavec @ 2007-09-13 13:04 UTC (permalink / raw)
To: mailing list for ConTeXt users
Hello,
I was trying to figure out how to process simple HTML files with the
new code, but I fail to understand the details. Here's a simple file I
would like to process:
<html>
<head>
<title>My first HTML2ConTeXt</title>
</head>
<body>
<h1>Main Title</h1>
<p>Some text ...</p>
<h2>Subtitle</h2>
<p>Some text again ...</p>
<h1>Second title</h1>
<p>... and not much more text here either ...</p>
</body>
</html>
And the failed tries here:
% engine=luatex
\setupcolors[state=start]
\setuphead[subject][style=bfa,color=blue]
\setuphead[subsubject][style=tfa,color=blue]
\starttext
\xmlload{main}{test.html}{}
\xmlgrab{main}{h1}{h1}
\xmlgrab{main}{h2}{h2}
\startxmlsetups h1
\subject{H1: #1}
\stopxmlsetups
\startxmlsetups h2
\subsubject{H2: #1}
\stopxmlsetups
How to grab only the title out of here?
\xmlfilter{main}{html/head/title}
\xmlflush{main}
\stoptext
Any hints most wellcome.
Thank a lot,
Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to process simple HTML files with LuaTeX
2007-09-13 13:04 How to process simple HTML files with LuaTeX Mojca Miklavec
@ 2007-09-13 22:22 ` Hans Hagen
2007-09-14 13:46 ` Mojca Miklavec
0 siblings, 1 reply; 11+ messages in thread
From: Hans Hagen @ 2007-09-13 22:22 UTC (permalink / raw)
To: mailing list for ConTeXt users
Mojca Miklavec wrote:
> Hello,
>
> I was trying to figure out how to process simple HTML files with the
> new code, but I fail to understand the details. Here's a simple file I
> would like to process:
>
> <html>
> <head>
> <title>My first HTML2ConTeXt</title>
> </head>
> <body>
> <h1>Main Title</h1>
> <p>Some text ...</p>
> <h2>Subtitle</h2>
> <p>Some text again ...</p>
> <h1>Second title</h1>
> <p>... and not much more text here either ...</p>
> </body>
> </html>
>
> And the failed tries here:
>
> % engine=luatex
> \setupcolors[state=start]
> \setuphead[subject][style=bfa,color=blue]
> \setuphead[subsubject][style=tfa,color=blue]
>
> \starttext
> \xmlload{main}{test.html}{}
> \xmlgrab{main}{h1}{h1}
> \xmlgrab{main}{h2}{h2}
>
> \startxmlsetups h1
> \subject{H1: #1}
> \stopxmlsetups
>
> \startxmlsetups h2
> \subsubject{H2: #1}
> \stopxmlsetups
>
> How to grab only the title out of here?
>
> \xmlfilter{main}{html/head/title}
>
> \xmlflush{main}
> \stoptext
>
> Any hints most wellcome.
keep in mind that this is still somewhat experimental
% best define mappings before loading the file
\startxmlsetups all:html
\xmlsetsetup{main}{head|h1|h2}{*}
\stopxmlsetups
\xmlregistersetup{all:html}
% register this so that it's done for each load
\startxmlsetups h1
\subject{\xmlflush{#1}}
\stopxmlsetups
\startxmlsetups h2
\subsubject{\xmlflush{#1}}
\stopxmlsetups
\startxmlsetups head
\startstandardmakeup
THIS IS ABOUT: \xmlfilter{main}{/head/title/text()}
\stopstandardmakeup
\stopxmlsetups
% that's it
\setupcolors[state=start]
\setuphead[subject][style=\bfd,color=blue]
\setuphead[subsubject][style=\bfc,color=blue]
\starttext
\xmlprocess{main}{test.html}{}
\stoptext
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to process simple HTML files with LuaTeX
2007-09-13 22:22 ` Hans Hagen
@ 2007-09-14 13:46 ` Mojca Miklavec
2007-09-14 14:19 ` Hans Hagen
2007-09-14 14:26 ` Hans Hagen
0 siblings, 2 replies; 11+ messages in thread
From: Mojca Miklavec @ 2007-09-14 13:46 UTC (permalink / raw)
To: mailing list for ConTeXt users
[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]
On 9/14/07, Hans Hagen wrote:
> Mojca Miklavec wrote:
> > Hello,
> >
> > I was trying to figure out how to process simple HTML files with the
> > new code, but I fail to understand the details. Here's a simple file I
> > would like to process:
> >
> keep in mind that this is still somewhat experimental
Sure :)
That's why I'm sending files for testing :) :) :)
> % best define mappings before loading the file
>
> \startxmlsetups all:html
> \xmlsetsetup{main}{head|h1|h2}{*}
> \stopxmlsetups
>
> \xmlregistersetup{all:html}
>
> % register this so that it's done for each load
>
> \startxmlsetups h1
> \subject{\xmlflush{#1}}
> \stopxmlsetups
>
> \startxmlsetups h2
> \subsubject{\xmlflush{#1}}
> \stopxmlsetups
>
> \startxmlsetups head
> \startstandardmakeup
> THIS IS ABOUT: \xmlfilter{main}{/head/title/text()}
> \stopstandardmakeup
> \stopxmlsetups
>
> % that's it
>
>
> \setupcolors[state=start]
> \setuphead[subject][style=\bfd,color=blue]
> \setuphead[subsubject][style=\bfc,color=blue]
>
> \starttext
>
> \xmlprocess{main}{test.html}{}
>
> \stoptext
Great! This works perfect and seems much easier to write than the old
code, though I still have no idea how to implement some parts of it:
- where to plug in the entities such as , ≤, ...
- how to catch classes: how to differentiate between <h1>title</h1>
and <h1 class="...">title</h1>
- and some more - there are some simple examples in the attachment
(too long to copy-paste)
Thanks again,
Mojca
[-- Attachment #2: frogs.tex --]
[-- Type: application/x-tex, Size: 2310 bytes --]
[-- Attachment #3: frogs.html --]
[-- Type: text/html, Size: 1628 bytes --]
[-- Attachment #4: Type: text/plain, Size: 487 bytes --]
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to process simple HTML files with LuaTeX
2007-09-14 13:46 ` Mojca Miklavec
@ 2007-09-14 14:19 ` Hans Hagen
2007-09-16 10:29 ` Mojca Miklavec
2007-09-14 14:26 ` Hans Hagen
1 sibling, 1 reply; 11+ messages in thread
From: Hans Hagen @ 2007-09-14 14:19 UTC (permalink / raw)
To: mailing list for ConTeXt users
Mojca Miklavec wrote:
> Great! This works perfect and seems much easier to write than the old
> code, though I still have no idea how to implement some parts of it:
> - where to plug in the entities such as , ≤, ...
\xmlutfize{main}
or just load the regular entity handlers (mkii still works and can be
used mixed)
> - how to catch classes: how to differentiate between <h1>title</h1>
> and <h1 class="...">title</h1>
> - and some more - there are some simple examples in the attachment
> (too long to copy-paste)
\doifelse {\xmlatt{#1}{class}} {whatever} {
dothis
} {
dothat
}
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to process simple HTML files with LuaTeX
2007-09-14 13:46 ` Mojca Miklavec
2007-09-14 14:19 ` Hans Hagen
@ 2007-09-14 14:26 ` Hans Hagen
2007-09-16 10:31 ` Mojca Miklavec
1 sibling, 1 reply; 11+ messages in thread
From: Hans Hagen @ 2007-09-14 14:26 UTC (permalink / raw)
To: mailing list for ConTeXt users
Mojca Miklavec wrote:
> On 9/14/07, Hans Hagen wrote:
>> Mojca Miklavec wrote:
>>> Hello,
>>>
>>> I was trying to figure out how to process simple HTML files with the
>>> new code, but I fail to understand the details. Here's a simple file I
>>> would like to process:
>>>
>> keep in mind that this is still somewhat experimental
>
> Sure :)
> That's why I'm sending files for testing :) :) :)
- i'll make a table mapper (need it anyway), cals tables are already
provided
- idem for preformatted and verbatim
- your code:
d[k] = dk:gsub(" ",' ')
dk = d[k]
d[k] = dk:gsub("≤", '\\mathematics{\\le}')
local dk = d[k]
dk = dk:gsub(" ",' ')
dk = dk:gsub("≤", '\\mathematics{\\le}')
d[k] = dk
or ....
mojcasentities = {
nbsp = " ",
le = "'\\mathematics{\\le}'
}
local d[k]= d[k]:gsub("&(.-);",mojcasentities)
(there probably already is code for that)
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to process simple HTML files with LuaTeX
2007-09-14 14:19 ` Hans Hagen
@ 2007-09-16 10:29 ` Mojca Miklavec
2007-09-16 21:55 ` Hans Hagen
0 siblings, 1 reply; 11+ messages in thread
From: Mojca Miklavec @ 2007-09-16 10:29 UTC (permalink / raw)
To: mailing list for ConTeXt users
On 9/14/07, Hans Hagen wrote:
> Mojca Miklavec wrote:
>
> > Great! This works perfect and seems much easier to write than the old
> > code, though I still have no idea how to implement some parts of it:
> > - where to plug in the entities such as , ≤, ...
>
> \xmlutfize{main}
Thanks. I saw it, but had no idea how to use it. I need to test more
extensively ... :)
> > - how to catch classes: how to differentiate between <h1>title</h1>
> > and <h1 class="...">title</h1>
> > - and some more - there are some simple examples in the attachment
> > (too long to copy-paste)
>
> \doifelse {\xmlatt{#1}{class}} {whatever} {
> dothis
> } {
> dothat
> }
I have tried exactly that before, but this example fails to work for
me, or I don't know how to apply it:
% test.html
<html>
<body>
<h1>Title 1</h1>
<h1 class="different">Title 2</h1>
</body>
</html>
% test.tex
\startxmlsetups all:html
\xmlsetsetup{main}{h1}{*}
\stopxmlsetups
\xmlregistersetup{all:html}
\startxmlsetups h1
This title belongs to class (\xmlatt{#1}{class}): \xmlflush{#1}.\par
\stopxmlsetups
\starttext
\xmlprocess{main}{test.html}{}
\stoptext
Class always comes out empty.
Thanks a lot,
Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to process simple HTML files with LuaTeX
2007-09-14 14:26 ` Hans Hagen
@ 2007-09-16 10:31 ` Mojca Miklavec
2007-09-16 18:07 ` Aditya Mahajan
0 siblings, 1 reply; 11+ messages in thread
From: Mojca Miklavec @ 2007-09-16 10:31 UTC (permalink / raw)
To: mailing list for ConTeXt users
On 9/14/07, Hans Hagen <pragma@wxs.nl> wrote:
> Mojca Miklavec wrote:
> > On 9/14/07, Hans Hagen wrote:
> >> Mojca Miklavec wrote:
> >>> Hello,
> >>>
> >>> I was trying to figure out how to process simple HTML files with the
> >>> new code, but I fail to understand the details. Here's a simple file I
> >>> would like to process:
> >>>
> >> keep in mind that this is still somewhat experimental
> >
> > Sure :)
> > That's why I'm sending files for testing :) :) :)
>
> - i'll make a table mapper (need it anyway), cals tables are already
> provided
>
> - idem for preformatted and verbatim
Thanks a lot. I'm waiting patiently :)
> - your code:
>
> d[k] = dk:gsub(" ",' ')
> dk = d[k]
> d[k] = dk:gsub("≤", '\\mathematics{\\le}')
>
> local dk = d[k]
> dk = dk:gsub(" ",' ')
> dk = dk:gsub("≤", '\\mathematics{\\le}')
> d[k] = dk
>
> or ....
>
> mojcasentities = {
> nbsp = " ",
> le = "'\\mathematics{\\le}'
> }
>
> local d[k]= d[k]:gsub("&(.-);",mojcasentities)
Thanks a lot!
> (there probably already is code for that)
Yes, I saw it, but didn't try to understand what the &(.-) serves for.
In any case, that was the wrong place to replace le with something.
Thanks again,
Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to process simple HTML files with LuaTeX
2007-09-16 10:31 ` Mojca Miklavec
@ 2007-09-16 18:07 ` Aditya Mahajan
2007-09-16 21:58 ` Hans Hagen
0 siblings, 1 reply; 11+ messages in thread
From: Aditya Mahajan @ 2007-09-16 18:07 UTC (permalink / raw)
To: mailing list for ConTeXt users
On Sun, 16 Sep 2007, Mojca Miklavec wrote:
> On 9/14/07, Hans Hagen <pragma@wxs.nl> wrote:
>> mojcasentities = {
>> nbsp = " ",
>> le = "'\\mathematics{\\le}'
>> }
>>
>> local d[k]= d[k]:gsub("&(.-);",mojcasentities)
>
> Yes, I saw it, but didn't try to understand what the &(.-) serves for.
(Caveat: I do not really know lua regex, and have not tried out the
code)
Assuming lua follows standard regex syntax, this means
& # The letter &
( # start a group
. # any character
- # As few as needed
) # end group
; # the letter ;
so this will match all entities.
If it helps, the equivalent vim regex will be
\&\(.\{-}\);
I guess that $1 (the first group, that is everything that matches .-)
will be compared with mojcaentities table and replaced accordingly.
This looks like a really nice feature of lua. In Ruby and Vim, I often
find myself writing a bunch of similar regex, and always wished there
was something like what lua does.
Aditya
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to process simple HTML files with LuaTeX
2007-09-16 10:29 ` Mojca Miklavec
@ 2007-09-16 21:55 ` Hans Hagen
2007-10-02 3:17 ` Mojca Miklavec
0 siblings, 1 reply; 11+ messages in thread
From: Hans Hagen @ 2007-09-16 21:55 UTC (permalink / raw)
To: mailing list for ConTeXt users
Mojca Miklavec wrote:
> I have tried exactly that before, but this example fails to work for
> me, or I don't know how to apply it:
i rewrote the parser (both xml and semi-xpath) so it may have been
broken, i'll upload a new beta tomorrow
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to process simple HTML files with LuaTeX
2007-09-16 18:07 ` Aditya Mahajan
@ 2007-09-16 21:58 ` Hans Hagen
0 siblings, 0 replies; 11+ messages in thread
From: Hans Hagen @ 2007-09-16 21:58 UTC (permalink / raw)
To: mailing list for ConTeXt users
Aditya Mahajan wrote:
> (Caveat: I do not really know lua regex, and have not tried out the
> code)
they are not regexp but expressions -)
> Assuming lua follows standard regex syntax, this means
>
> & # The letter &
> ( # start a group
> .. # any character
> - # As few as needed
> ) # end group
> ; # the letter ;
>
> so this will match all entities.
just &(.-); with () being the capture
> If it helps, the equivalent vim regex will be
> \&\(.\{-}\);
>
> I guess that $1 (the first group, that is everything that matches .-)
%1
> will be compared with mojcaentities table and replaced accordingly.
indeed
> This looks like a really nice feature of lua. In Ruby and Vim, I often
> find myself writing a bunch of similar regex, and always wished there
> was something like what lua does.
the nice thing about many lua feature is that less code (lua c code)
behaves more powerful
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to process simple HTML files with LuaTeX
2007-09-16 21:55 ` Hans Hagen
@ 2007-10-02 3:17 ` Mojca Miklavec
0 siblings, 0 replies; 11+ messages in thread
From: Mojca Miklavec @ 2007-10-02 3:17 UTC (permalink / raw)
To: mailing list for ConTeXt users
On 9/16/07, Hans Hagen wrote:
>
> i rewrote the parser (both xml and semi-xpath) so it may have been
> broken, i'll upload a new beta tomorrow
Hello Hans,
Thanks a lot for fixing the issue with non-working \xmlatt.
Now, I'm still slightly lost regarding two issues:
- How to remove unneeded space? With \ignorespaces?
- How to use the new verbatim code? I have tried to use
\xmlsetfunction{main}{pre}{lxml.verbatim}
but it didn't really work.
% test.tex:
\startxmlsetups all:html
\xmlsetsetup{main}{h1|pre}{*}
\stopxmlsetups
\xmlregistersetup{all:html}
% is this the proper way?
\startxmlsetups h1
\subject{\ignorespaces\xmlflush{#1}}
\stopxmlsetups
\startxmlsetups pre
{\bgroup\tt\obeylines\xmlflush{#1}\egroup}
\stopxmlsetups
\starttext
\xmlprocess{main}{test.html}{}
\stoptext
% test.html
<?xml version="1.0" encoding="utf-8"?>
<html><body>
<h1>
How to get rid of this spacing in some elegant way?
</h1>
<p>Title followed by a paragraph ...</p>
<pre>
and
some source
c@de
</pre>
</body></html>
Also, this fails because of the empty line:
<h1>
How to get rid of this spacing in some
elegant way?
</h1>
Thanks a lot,
Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-10-02 3:17 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-09-13 13:04 How to process simple HTML files with LuaTeX Mojca Miklavec
2007-09-13 22:22 ` Hans Hagen
2007-09-14 13:46 ` Mojca Miklavec
2007-09-14 14:19 ` Hans Hagen
2007-09-16 10:29 ` Mojca Miklavec
2007-09-16 21:55 ` Hans Hagen
2007-10-02 3:17 ` Mojca Miklavec
2007-09-14 14:26 ` Hans Hagen
2007-09-16 10:31 ` Mojca Miklavec
2007-09-16 18:07 ` Aditya Mahajan
2007-09-16 21:58 ` Hans Hagen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).