From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/30737 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Albert Krewinkel Newsgroups: gmane.text.pandoc Subject: Re: Feature Idea: docx -> HTML table styling Date: Thu, 16 Jun 2022 08:49:04 +0200 Message-ID: <87y1xxvzt3.fsf@zeitkraut.de> References: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="12217"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Noah Malmed To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCZJF7XJTILRB7NTVOKQMGQEXUUUBWQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Jun 16 09:21:38 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-lf1-f58.google.com ([209.85.167.58]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1o1jow-0002yf-80 for gtp-pandoc-discuss@m.gmane-mx.org; Thu, 16 Jun 2022 09:21:38 +0200 Original-Received: by mail-lf1-f58.google.com with SMTP id s12-20020a056512202c00b00478f00fcb3dsf370476lfs.7 for ; Thu, 16 Jun 2022 00:21:38 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1655364097; cv=pass; d=google.com; s=arc-20160816; b=UqLmzHSibKLXh39e8UejMYAn23AYO7b92rTaFQZG+/KnwINQlaSVpa3+V46wZnAkl2 cV2HDz8gZBi46cl0O4Sg7HIIpnY5JmoCza+/uMmT+bYqeSa0nJLbISnTE3tGl85mE/rl XUvfCkJCsJe7q4tZ9pVGCCSB9qg9vkkivHHmt4oaTyzXXN84VnUoE/IZar/b5YTtu1T3 oHqEt/5I2pdnoiuGWKWIAJpfKbZprYdNXsr7Ly5zy+Idb9DX/vjKa1nfmQ/BEvDCw+9u Xp8Ew2cfeD9+4mMCc5BefliENODBtANynfK1SIYbbwJo1wU92SToQTJneNRzhgzwObvs LoEw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:mime-version:message-id :in-reply-to:date:subject:cc:to:from:references:sender :dkim-signature; bh=D7FJIPPOtsPriG5aKp43UrJkox+bkENOPi5o/Wu30og=; b=eTvmjzRQbLtiexJFwv1nquc7YinCapPYoWvrfycQbDMv/MpTsd4NNQYy8/g7iH/D0H 1LpvuOQe7ugP3tLPtRiT0h6etlI5Lvg9ITvtATVI8VLn2iSJOiPp9++KWRUWXK0mZuND u68tgokMo36AayWr/3KWX1tid1VEM9HiskEJk2uDhWPsGBOAmIiD0xIiPouRrxprJBDx GpwtzRRkU2ZpIc2rZ85wUUztQ3s4RBwTa0dWfv/rMXT5KazMOKpfqn0FVmrNDIFfJ3g6 yGBDDdQ2TjKxgZFpp1RNKi8qV9Kxbxf0ko2VfR7Kf+WQOBuQls0jfhh60IBe6C312BL0 5ItA== ARC-Authentication-Results: i=2; gmr-mx.google.com; spf=pass (google.com: domain of albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org designates 80.241.56.172 as permitted sender) smtp.mailfrom=albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:references:from:to:cc:subject:date:in-reply-to:message-id :mime-version:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:list-subscribe:list-unsubscribe; bh=D7FJIPPOtsPriG5aKp43UrJkox+bkENOPi5o/Wu30og=; b=SGzBLqRLJ/udYmSdT8QbDuUQ1ywrsQ4MdkQbyE3QAeee8pcTM6Go2d9O9Ddn9ltDMF yU0gweOMgGWRcC4rLQAFKTazk5LKSmTBik+PC7Rjm9Yb+guTf5EFXtOLnALGRQpe1qki uGGK4UIYLIxNifD1pOv/Qzc3ZDvA5jPsCElSrBTWtjkueUVh8o9PhcNEz0qeGc9XMTUL Fc/HhLmutgJC8SlTydciOoXx9qoi+xq81/eJ17T80Tuo+S4Sf+MGfnWJGnU3V5/BZNWB Y8n77uT0/hsFxAMZJgXM/erWM92XoBTvApQdnFrNAbuToaGDAhZCD0V+IlgUxnfe2Ntx 854w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:references:from:to:cc:subject:date :in-reply-to:message-id:mime-version:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=D7FJIPPOtsPriG5aKp43UrJkox+bkENOPi5o/Wu30og=; b=0FuzV0/AcmAIi1YKIY879rWqQ7h34LU1LKp8QAuWZxWBNFEWcIgCz3+BmGT0z2PEd0 G3b65Fva4CnogG5veY7qsCVlc+Gj/1QiCBfxYj7DAnEiHUpnN10UrQUL4dOP6nNrGEYR c4ltvohLgFa0k7A/orMh2z3Eb4LvFZk/bUR28B8J0q676FckBeYw+UJTtWdC+TyWkH0I Fo04l3Oq47d+jsQyPuYLfmwHD1Kh3y7VJIeVbUwOOlGkU43R/DdBh4jSXNVD1AfqCr08 beYD7zSdIIyGDkjV21pbYonig1kAhkI8f1Sv+WQTlsogCHqQ8PJOC3d43U3CY2dc5iPm XA+Q== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AJIora+76KlMNQmWoQKDjxRs6wl0IUDTq8gj5/zb+bQxok7ToEXDxLF1 v/s7ZBZmdDV+HX15m37LbOY= X-Google-Smtp-Source: AGRyM1vNBpYRnUJYFZWnYKu9GVHc+qm5pRnbJWu7KrXBZ5I9eMRs9A1/x3DaP1P6WIe1ryqxZhjftg== X-Received: by 2002:a05:6512:2304:b0:47c:5268:5021 with SMTP id o4-20020a056512230400b0047c52685021mr1966053lfu.369.1655364097581; Thu, 16 Jun 2022 00:21:37 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6512:1693:b0:448:3742:2320 with SMTP id bu19-20020a056512169300b0044837422320ls47930lfb.1.gmail; Thu, 16 Jun 2022 00:21:32 -0700 (PDT) X-Received: by 2002:a05:6512:3f87:b0:47d:c87e:f8f0 with SMTP id x7-20020a0565123f8700b0047dc87ef8f0mr1949589lfa.431.1655364092385; Thu, 16 Jun 2022 00:21:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655364092; cv=none; d=google.com; s=arc-20160816; b=QztZgZhLP2M/bda8K1GvxTsdKsQP5UuQPqmn0PyLcZDx12yt/BZiw9mlkErst4gbgl gapsDt63mGGsyJpG6aMIqqqjP5hrDyVzkvglG7gp7gOLmPbkj2OTYpJjRUD8E3K28gKx s+rffngOyF+DoIALjIACFW5HcOhRDbF1XMb0/9EDoBCY1rpFhfQ/ET0NAL/dRWFetGbq fWVEsIgVgC/7DI+MPIWmFrY9MIqU9X8RnUV5nCgVawRb/7F12UpNC7TEtbRbYOx1PrIK v4bwesPwQ35O27j147VpzYehFAAEEfdjgKQty15eH2o+Dm5jbLB++e3MyonmXwGzTyom HALw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:message-id:in-reply-to:date:subject:cc:to:from :references; bh=Sz0LkRvSMw7IgvUEKkUP1hNjHUf1Idad2lV7t9bWd/Q=; b=yq4rWY3WW1ueodcSvgk80FD1JLwSagaUDA0+4O0hxQsFoDFleEKeBlRXbbm1d8bkvh 3bUjvwacl6x6WMxReX0KComtmhbPSFUYy6WMBuRoRhdg9pZjG/SgI/+ofzuwZ4NO8v5w aNAoqlKjY8e6SmGwDZ2XpTnKYV4ofIPbbhC0qTaDwqk0J1D+OGlA8P0sbF8u2/0Oslle MvREgUiPokNd6BPnneEg0HQapeZrgOUkLTc8CYiRd0ISbnfi0Jnj8RVdnMexG7kUSkUh Jvwk0+rgpGrNOtl4jVlffeCnmHBHWhi0Uy0ENjC1WWkHcY0uDyZEGFaOIdVH2SMYypPX dpaQ== ARC-Authentication-Results: i=1; gmr-mx.google.com; spf=pass (google.com: domain of albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org designates 80.241.56.172 as permitted sender) smtp.mailfrom=albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org Original-Received: from mout-p-202.mailbox.org (mout-p-202.mailbox.org. [80.241.56.172]) by gmr-mx.google.com with ESMTPS id be17-20020a056512251100b00472587043edsi47849lfb.1.2022.06.16.00.21.32 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Jun 2022 00:21:32 -0700 (PDT) Received-SPF: pass (google.com: domain of albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org designates 80.241.56.172 as permitted sender) client-ip=80.241.56.172; Original-Received: from smtp2.mailbox.org (smtp2.mailbox.org [10.196.197.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-202.mailbox.org (Postfix) with ESMTPS id 4LNtrn52fXz9sQG; Thu, 16 Jun 2022 09:21:29 +0200 (CEST) In-reply-to: X-Original-Sender: albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org designates 80.241.56.172 as permitted sender) smtp.mailfrom=albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:30737 Archived-At: Hi Noah, Just chiming in to report on our experiences with tables in a project where we used pandoc to publish journal articles. Our main goal there was to publish HTML and PDFs from Docx inputs, with an option to handle JATS as well (Project: Journal: ). We found that authors writing in Word essentially use tables as a graphic and layouting tool. Markdown was used as our central format, which worked extremely well: we converted Docx -> Markdown, fixed markup when necessary, then published via pandoc. Just tables proved problematic. For some tables, we ended up writing separate HTML and PDFs by hand. See the "Sonderausgabe | Podcast" in that journal for results. This is just to say that pandoc may not be the right tool if you aim for *fully automatic* conversion of scholarly Docx articles. Maybe tables should just be expected to require manual tuning. I believe that [transpect](https://transpect.github.io) tries to preserve more of the styling, maybe it is more in line with what you need? Citation support isn't as complete though (last I heard). Happy to answer questions about any of this. Cheers, Albert Noah Malmed writes: > Hello! > > We use Pandoc often to convert from docx to HTML, and many of the > documents we convert include tables. As far as we can tell, almost all > of the table styling is lost in the docx reader. Specifically, we care > about 5 things: > > 1. Text justification (left, center, or right) > > 2. Vertical alignment (top, middle, or bottom) > > 3. Text indentation > > 4. Cell shading and text color > > 5. Table borders > > We hope to enhance the docx reader so that these stylings get preserved > in the AST. > > Proposed solutions: > > 1. It seems like text justification already exists in the AST through > the Alignment value. It just needs to get implemented in the docx > reader, as described in this issue: > https://github.com/jgm/pandoc/issues/6316 > > 2. Add the vertical alignment style to attributes as suggested here > > 3. Add text indentation to attributes in the form of the style > padding-left > > 4. Add cell shading and text color to attributes in the form of the > styles background-color and color > > 5. Add table borders to attributes in the form of the style border > > > Does this sound like a sane and feasible solution? We're pretty > motivated and willing to work on these changes, just want to know if > they would be the best route! -- Albert Krewinkel GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124