[nem-en] Language support and BOM
Alexey Borzenkov
snaury at gmail.com
Tue Jan 30 20:47:49 CET 2007
Haven't tried with utf-32, but for utf-16 yes it does (on MS.NET 2.0). And
of course it would. :) Omitting third parameter is just the same as passing
true there. When in doubt, Reflector to the rescue. ;)
On 1/30/07, Kamil Skalski <kamil.skalski at gmail.com> wrote:
>
> Hm, but does it detect utf-16 / 32 by BOM without third parameter to
> StreamReader?
>
> 2007/1/30, Alexey Borzenkov <snaury at gmail.com>:
> > Oh, btw, I just found a better way for creating proper UTF8 encoding:
> >
> > def file = StreamReader(filename, UTF8Encoding(true,true));
> >
> > It's important to set both parameters to true, as this will ensure that
> > encoding has proper preamble that will be captured by StreamReader
> first,
> > and thus we'll get invalid sequences captured even when file has BOM.
> And if
> > you want to ignore invalid sequences when file has BOM (but still want
> to
> > capture them when file has no BOM), then just use
> > UTF8Encoding(false,true)...
> >
> >
> > On 1/30/07, Alexey Borzenkov <snaury at gmail.com> wrote:
> > > Hi Kamil,
> > >
> > > What's the point in detecting BOM manually? Implementations of
> > IO.StreamReader *must* do that on their on, and it's even more important
> > that they can evolve and detect future BOM cases that don't exist these
> > days, something you won't be able to foretell in your implementation (or
> > will have to constantly change it). Check this:
> > >
> > > using System;
> > > using System.IO;
> > > using System.Text;
> > >
> > > def file = StreamReader("1.txt",
> > Encoding.GetEncoding(Encoding.UTF8.CodePage, EncoderExceptionFallback(),
> > DecoderExceptionFallback()));
> > > def text = file.ReadToEnd();
> > > Console.WriteLine(file.CurrentEncoding.HeaderName);
> > >
> > > It detects BOMs very well (at least on MS.NET , haven't checked with
> Mono,
> > but if it's not working with Mono it's Mono bug), and throws exceptions
> when
> > BOM-less file has non-utf-8 sequences (and it seems it throws on invalid
> > sequences even when file actually has BOM).
> > >
> > > Hint (just in case): StreamReader.CurrentEncoding is *not* detected
> until
> > you actually do read at least one characted from the file. :)
> > >
> > >
> > > On 1/30/07, Kamil Skalski <kamil.skalski at gmail.com> wrote:
> > > >
> > > > Ok, I need volunteers using various codepages to test attached bom
> > > > parsing / encoding enforcement program.
> > > >
> > > > You must save it as t.n, edit the comment at the beginning to your
> > > > country's native characters, compile, run executable.
> > > >
> > > > We need following confirmations:
> > > > - program fails when you process t.n saved in non-utf codepage
> > > > - program runs fine with the same file saved as utf-8 (with and
> withouth
> > bom)
> > > > - you can also test other utfs, but they must always have explicit
> BOM
> > > >
> > > > UTF32 Big Endian is unfortunately not supported by mono, so we will
> > > > skip it at the moment.
> > > >
> > > >
> > > > 07-01-30, Michal Moskal < michal.moskal at gmail.com> napisał(a):
> > > > > On 1/30/07, vc <vc at rsdn.ru> wrote:
> > > > > > > On Behalf Of Michal Moskal
> > > > > >
> > > > > >
> > > > > > > > But, in my option, UTF-16 also should be supported.
> > > > > > >
> > > > > > > But with BOM, then UTF-32 should also be fine.
> > > > > >
> > > > > > Well...
> > > > > >
> > > > > > But, we remain old problem - we can't warn user if it use "non
> utf
> > file".
> > > > >
> > > > > How come? We would just reject non-utf8 file with no BOM.
> > > > >
> > > > > --
> > > > > Michał
> > > > >
> > > > > _______________________________________________
> > > > > https://nemerle.org/mailman/listinfo/devel-en
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Kamil Skalski
> > > > http://nazgul.omega.pl
> > > >
> > > > _______________________________________________
> > > > https://nemerle.org/mailman/listinfo/devel-en
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
> > _______________________________________________
> > https://nemerle.org/mailman/listinfo/devel-en
> >
> >
> >
>
>
> --
> Kamil Skalski
> http://nazgul.omega.pl
>
> _______________________________________________
> https://nemerle.org/mailman/listinfo/devel-en
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: /mailman/pipermail/devel-en/attachments/20070130/785c19ee/attachment.html
More information about the devel-en
mailing list