[nem-en] Re: Language support and BOM

Andrey Khropov andrey.khropov at gmail.com
Mon Jan 29 23:42:47 CET 2007


Kamil Skalski wrote:


> I do not agree with this. The rule is quite simple - you must use
> utf8. This way we:
> - have it standardized, so developers can exchange files without
> problems or having to worry about converting
> - do not support deprecated encodings, which simply break when not run
> on "my machine"
> - keep the implementation simple and reliable
> 
> Auto detection is good when you need to deal with large amount of data
> coming from non-programmers using lame soft.

100% agree. Unicode is the standard in .NET world. 

Here is the exceptions from CLR standard:

-------------------------------------------------------------------------
CLS Rule 4: Assemblies shall follow Annex 7 of Technical Report 15 of the
Unicode Standard 3.0 governing
the set of characters permitted to start and be included in identifiers,
available on-line at
http://www.unicode.org/unicode/reports/tr15/tr15-18.html. Identifiers shall be
in the canonical format defined
by Unicode Normalization Form C. For CLS purposes, two identifiers are the same
if their lowercase mappings
(as specified by the Unicode locale-insensitive, one-to-one lowercase mappings)
are the same. That is, for two
identifiers to be considered different under the CLS they shall differ in more
than simply their case. However,
in order to override an inherited definition the CLI requires the precise
encoding of the original declaration be
used.
-------------------------------------------------------------------------

and from C# standard:

-------------------------------------------------------------------------
A conforming implementation of C# shall interpret characters in conformance
with the Unicode Standard
Version 4.0 and ISO/IEC 10646-1. Conforming implementations must accept Unicode
source files encoded
with the UTF-8 encoding form.
-------------------------------------------------------------------------

Use of any other deprecated encoding is asking for trouble and isn't worth it
trying to save some bytes.

Besides that it will require NCC is either to recode all the files to Unicode
on the fly or deal with idiosyncrasies of all different encodings in the parser
which will of course slow down the compilation.

-- 
AKhropov




More information about the devel-en mailing list