List for cgit developers and users
 help / color / mirror / Atom feed
* Encoding problem
@ 2013-09-27 23:19 mysql.jorge
  2013-09-30 10:33 ` john
  0 siblings, 1 reply; 4+ messages in thread
From: mysql.jorge @ 2013-09-27 23:19 UTC (permalink / raw)


Hi,

 

Is it possible to define charset in cgitrc?

I'm having encoding problems in the frontend, in the latest version 1.8.4
from version 0.9.2, and now non-ascii chars are shown with ?? or some other
char instead of the correct one.

 

Is there a charset option for cgit ? I can't find it.

 

Thanks in advanced,

Jorge Bastos,

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.zx2c4.com/pipermail/cgit/attachments/20130928/ebbb0220/attachment.html>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Encoding problem
  2013-09-27 23:19 Encoding problem mysql.jorge
@ 2013-09-30 10:33 ` john
  2013-10-05 10:32   ` mysql.jorge
  0 siblings, 1 reply; 4+ messages in thread
From: john @ 2013-09-30 10:33 UTC (permalink / raw)


On Sat, Sep 28, 2013 at 12:19:38AM +0100, Jorge Bastos wrote:
> Is it possible to define charset in cgitrc?
> 
> I'm having encoding problems in the frontend, in the latest version 1.8.4
> from version 0.9.2, and now non-ascii chars are shown with ?? or some other
> char instead of the correct one.
> 
>  
> 
> Is there a charset option for cgit ? I can't find it.

The charset is hardcoded to "UTF-8", which should be the default
encoding for Git commit messages and CGit does attempt to transcode Git
messages to the correct encoding.

Are you seeing '??' in the commit message or in blob/tree content?

Do you have a public repository that is exhibiting these symptoms?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Encoding problem
  2013-09-30 10:33 ` john
@ 2013-10-05 10:32   ` mysql.jorge
  2013-10-06 10:46     ` john
  0 siblings, 1 reply; 4+ messages in thread
From: mysql.jorge @ 2013-10-05 10:32 UTC (permalink / raw)


> On Sat, Sep 28, 2013 at 12:19:38AM +0100, Jorge Bastos wrote:
> > Is it possible to define charset in cgitrc?
> >
> > I'm having encoding problems in the frontend, in the latest version
> > 1.8.4 from version 0.9.2, and now non-ascii chars are shown with ??
> or
> > some other char instead of the correct one.
> >
> >
> >
> > Is there a charset option for cgit ? I can't find it.
> 
> The charset is hardcoded to "UTF-8", which should be the default
> encoding for Git commit messages and CGit does attempt to transcode Git
> messages to the correct encoding.
> 
> Are you seeing '??' in the commit message or in blob/tree content?
> 
> Do you have a public repository that is exhibiting these symptoms?

Hi John,

I was checking and the file in question was indeed in ANSI, changed the file
encoding to utf8 and it's OK.
Anyway, I have gitweb install side-by-side, and in gitweb it was shown
correctly.

I have other places where chars are not shown OK but didn't get any
conclution about the file encoding, I'll tell you later,





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Encoding problem
  2013-10-05 10:32   ` mysql.jorge
@ 2013-10-06 10:46     ` john
  0 siblings, 0 replies; 4+ messages in thread
From: john @ 2013-10-06 10:46 UTC (permalink / raw)


On Sat, Oct 05, 2013 at 11:32:54AM +0100, Jorge Bastos wrote:
> > On Sat, Sep 28, 2013 at 12:19:38AM +0100, Jorge Bastos wrote:
> > > Is it possible to define charset in cgitrc?
> > >
> > > I'm having encoding problems in the frontend, in the latest version
> > > 1.8.4 from version 0.9.2, and now non-ascii chars are shown with ??
> > or
> > > some other char instead of the correct one.
> > >
> > >
> > >
> > > Is there a charset option for cgit ? I can't find it.
> > 
> > The charset is hardcoded to "UTF-8", which should be the default
> > encoding for Git commit messages and CGit does attempt to transcode Git
> > messages to the correct encoding.
> > 
> > Are you seeing '??' in the commit message or in blob/tree content?
> > 
> > Do you have a public repository that is exhibiting these symptoms?
> 
> I was checking and the file in question was indeed in ANSI, changed the file
> encoding to utf8 and it's OK.
> Anyway, I have gitweb install side-by-side, and in gitweb it was shown
> correctly.
> 
> I have other places where chars are not shown OK but didn't get any
> conclution about the file encoding, I'll tell you later,

I've had another look at this, and Gitweb is doing this for all data it
outputs:

    # decode sequences of octets in utf8 into Perl's internal form,
    # which is utf-8 with utf8 flag set if needed.  gitweb writes out
    # in utf-8 thanks to "binmode STDOUT, ':utf8'" at beginning
    sub to_utf8 {
        my $str = shift;
        return undef unless defined $str;

        if (utf8::is_utf8($str) || utf8::decode($str)) {
            return $str;
        } else {
            return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
        }
    }

Do you know what the fallback encoding on your Gitweb installation is?
(The default is 'latin1').

If you're not using any other source filter with CGit, you should get
the same result by configuring the following script as "source-filter"
in your cgitrc file.

We'll still get it wrong in "plain" view though, since we
unconditionally set the charset to UTF-8 there and dump the content out
raw; that can be tweaked in the config file but it looks like we get
that wrong and unconditionally append a "charset=" to the MIME type even
for binary types.

-- >8 --
#!/usr/bin/perl
use strict;
use warnings;
use Encode;

binmode STDOUT, ':utf8';

my $str = do { local $/; <STDIN> };

if (utf8::decode($str)) {
        print $str;
} else {
        print decode('latin1', $str, Encode::FB_DEFAULT);
}


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-10-06 10:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-27 23:19 Encoding problem mysql.jorge
2013-09-30 10:33 ` john
2013-10-05 10:32   ` mysql.jorge
2013-10-06 10:46     ` john

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).