List for cgit developers and users
 help / color / mirror / Atom feed
From: john at keeping.me.uk ('John Keeping')
Subject: Encoding problem
Date: Sun, 6 Oct 2013 11:46:33 +0100	[thread overview]
Message-ID: <20131006104633.GI27238@serenity.lan> (raw)
In-Reply-To: <002101cec1b6$40b20420$c2160c60$@jorge@decimal.pt>

On Sat, Oct 05, 2013 at 11:32:54AM +0100, Jorge Bastos wrote:
> > On Sat, Sep 28, 2013 at 12:19:38AM +0100, Jorge Bastos wrote:
> > > Is it possible to define charset in cgitrc?
> > >
> > > I'm having encoding problems in the frontend, in the latest version
> > > 1.8.4 from version 0.9.2, and now non-ascii chars are shown with ??
> > or
> > > some other char instead of the correct one.
> > >
> > >
> > >
> > > Is there a charset option for cgit ? I can't find it.
> > 
> > The charset is hardcoded to "UTF-8", which should be the default
> > encoding for Git commit messages and CGit does attempt to transcode Git
> > messages to the correct encoding.
> > 
> > Are you seeing '??' in the commit message or in blob/tree content?
> > 
> > Do you have a public repository that is exhibiting these symptoms?
> 
> I was checking and the file in question was indeed in ANSI, changed the file
> encoding to utf8 and it's OK.
> Anyway, I have gitweb install side-by-side, and in gitweb it was shown
> correctly.
> 
> I have other places where chars are not shown OK but didn't get any
> conclution about the file encoding, I'll tell you later,

I've had another look at this, and Gitweb is doing this for all data it
outputs:

    # decode sequences of octets in utf8 into Perl's internal form,
    # which is utf-8 with utf8 flag set if needed.  gitweb writes out
    # in utf-8 thanks to "binmode STDOUT, ':utf8'" at beginning
    sub to_utf8 {
        my $str = shift;
        return undef unless defined $str;

        if (utf8::is_utf8($str) || utf8::decode($str)) {
            return $str;
        } else {
            return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
        }
    }

Do you know what the fallback encoding on your Gitweb installation is?
(The default is 'latin1').

If you're not using any other source filter with CGit, you should get
the same result by configuring the following script as "source-filter"
in your cgitrc file.

We'll still get it wrong in "plain" view though, since we
unconditionally set the charset to UTF-8 there and dump the content out
raw; that can be tweaked in the config file but it looks like we get
that wrong and unconditionally append a "charset=" to the MIME type even
for binary types.

-- >8 --
#!/usr/bin/perl
use strict;
use warnings;
use Encode;

binmode STDOUT, ':utf8';

my $str = do { local $/; <STDIN> };

if (utf8::decode($str)) {
        print $str;
} else {
        print decode('latin1', $str, Encode::FB_DEFAULT);
}


      reply	other threads:[~2013-10-06 10:46 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-27 23:19 mysql.jorge
2013-09-30 10:33 ` john
2013-10-05 10:32   ` mysql.jorge
2013-10-06 10:46     ` john [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131006104633.GI27238@serenity.lan \
    --to=cgit@lists.zx2c4.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).