Scrigroup - Documente si articole

     

HomeDocumenteUploadResurseAlte limbi doc
AccessAdobe photoshopAlgoritmiAutocadBaze de dateCC sharp
CalculatoareCorel drawDot netExcelFox proFrontpageHardware
HtmlInternetJavaLinuxMatlabMs dosPascal
PhpPower pointRetele calculatoareSqlTutorialsWebdesignWindows
WordXml


Tokens - Unicode character escape sequences

C sharp



+ Font mai mare | - Font mai mic



Tokens

There are several kinds of tokens: identifiers, keywords, literals, operators, and punctuators. White space and comments are not tokens, though they may act as separators for tokens.

token:
identifier
keyword
integer-literal
real-literal
character-literal
string-literal
operator-or-punctuator



Unicode character escape sequences

A Unicode character escape sequence represents a Unicode character. Unicode character escape sequences are processed in identifiers (2.4.2), character literals (2.4.4.4), and regular string literals (2.4.4.5). A Unicode character escape is not processed in any other location (for example, to form an operator, punctuator, or keyword).

unicode-escape-sequence:
u hex-digit hex-digit hex-digit hex-digit
U hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit

A Unicode escape sequence represents the single Unicode character formed by the hexadecimal number following the "u" or "U" characters. Since C# uses a 16-bit encoding of Unicode characters in characters and string values, a Unicode character in the range U+10000 to U+10FFFF is not permitted in a character literal and is represented using two Unicode surrogate characters in a string literal. Unicode characters with code points above 0x10FFFF are not supported.

Multiple translations are not performed. For instance, the string literal "u005Cu005C" is equivalent to "u005C" rather than "". (The Unicode value u005C is the character "".)

The example

class Class1

}

shows several uses of u0066, which is the character escape sequence for the letter "f". The program is equivalent to

class Class1

}

Identifiers

The rules for identifiers given in this section correspond exactly to those recommended by the Unicode 3.0 standard, Technical Report 15, Annex 7, except that underscore is allowed as an initial character (as is traditional in the C programming language), Unicode escape characters are permitted in identifiers, and the "@" character is allowed as a prefix to enable keywords to be used as identifiers.

identifier:
available-identifier
@ identifier-or-keyword

available-identifier:
An identifier-or-keyword that is not a keyword

identifier-or-keyword:
identifier-start-character identifier-part-charactersopt

identifier-start-character:
letter-character
_ (the underscore character)

identifier-part-characters:
identifier-part-character
identifier-part-characters identifier-part-character

identifier-part-character:
letter-character
decimal-digit-character
connecting-character
combining-character
formatting-character

letter-character:
A Unicode character of classes Lu, Ll, Lt, Lm, Lo, or Nl
A unicode-escape-sequence representing a character of classes Lu, Ll, Lt, Lm, Lo, or Nl

combining-character:
A Unicode character of classes Mn or Mc
A unicode-escape-sequence representing a character of classes Mn or Mc

decimal-digit-character:
A Unicode character of the class Nd
A unicode-escape-sequence representing a character of the class Nd

connecting-character:
A Unicode character of the class Pc
A unicode-escape-sequence representing a character of the class Pc

formatting-character:
A Unicode character of the class Cf
A unicode-escape-sequence representing a character of the class Cf

Examples of legal identifiers include "identifier1", "_identifier2", and "@if".

The prefix "@" enables the use of keywords as identifiers, which is useful when interfacing with other programming languages. The character @ is not actually part of the identifier, so the identifier might be seen in other languages as a normal identifier, without the prefix. Use of the @ prefix for identifiers that are not keywords is permitted, but strongly discouraged as a matter of style.

The example:

class @class

}

class Class1

}

defines a class named "class" with a static method named "static" that takes a parameter named "bool". Note that since Unicode escapes are not permitted in keywords, the token "clu0061ss" is an identifier, and is the same identifier as "@class".

Two identifiers are considered the same if they are identical after the following transformations are applied, in order:

The prefix "@", if used, is removed.

Each unicode-escape-sequence is transformed into its corresponding Unicode character

Identifiers containing two consecutive underscore characters are reserved for use by the implementation. For example, an implementation may provide extended keywords that begin with two underscores.

Keywords

A keyword is an identifier-like sequence of characters that is reserved, and cannot be used as an identifier except when prefaced by the @ character.

keyword: one of
abstract as base bool break
byte case catch char checked
class const continue decimal default
delegate do double else enum
event explicit extern false finally
fixed float for foreach goto
if implicit in int interface
internal is lock long namespace
new null object operator out
override params private protected public
readonly ref return sbyte sealed
short sizeof stackalloc static string
struct switch this throw true
try typeof uint ulong unchecked
unsafe ushort using virtual void
volatile while

In some places in the grammar, specific identifiers have special meaning, but are not keywords. For example, within a property declaration, the "get" and "set" identifiers have special meaning (10.6.2). An identifier other than get or set is never permitted in these locations, so this use does not conflict with a use of these words as identifiers.

Literals

A literal is a source code representation of a value.

literal:
boolean-literal
integer-literal
real-literal
character-literal
string-literal
null-literal

Boolean literals

There are two boolean literal values: true and false.

boolean-literal:
true
false

The type of a boolean-literal is bool.

Integer literals

Integer literals are used to write values of types int, uint, long, and ulong. Integer literals have two possible forms: decimal and hexadecimal.

integer-literal:
decimal-integer-literal
hexadecimal-integer-literal

decimal-integer-literal:
decimal-digits integer-type-suffixopt

decimal-digits:
decimal-digit
decimal-digits decimal-digit

decimal-digit: one of
0 1 2 3 4 5 6 7 8 9

integer-type-suffix: one of
U u L l UL Ul uL ul LU Lu lU lu

hexadecimal-integer-literal:
0x hex-digits integer-type-suffixopt
0X hex-digits integer-type-suffixopt

hex-digits:
hex-digit
hex-digits hex-digit

hex-digit: one of
0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f

The type of an integer literal is determined as follows:

If the literal has no suffix, it has the first of these types in which its value can be represented: int, uint, long, ulong.

If the literal is suffixed by U or u, it has the first of these types in which its value can be represented: uint, ulong.

If the literal is suffixed by L or l, it has the first of these types in which its value can be represented: long, ulong.

If the literal is suffixed by UL, Ul, uL, ul, LU, Lu, lU, or lu, it is of type ulong.

If the value represented by an integer literal is outside the range of the ulong type, an error occurs.

As a matter of style, it is suggested that "L" be used instead of "l" when writing literals of type long, since it is easy to confuse the letter "l" with the digit "1".

To permit the smallest possible int and long values to be written as decimal integer literals, the following two rules exist:

When a decimal-integer-literal with the value 2147483648 (231) and no integer-type-suffix appears as the token immediately following a unary minus operator token (7.6.2), the result is a constant of type int with the value −2147483648 (−231). In all other situations, such a decimal-integer-literal is of type uint.

When a decimal-integer-literal with the value 9223372036854775808 (263) and no integer-type-suffix appears as the token immediately following a unary minus operator token (7.6.2), the result is a constant of type long with the value −9223372036854775808 (−263). In all other situations, such a decimal-integer-literal is of type ulong.

Real literals

Real literals are used to write values of types float, double, and decimal.

real-literal:
decimal-digits
. decimal-digits exponent-partopt real-type-suffixopt
. decimal-digits exponent-partopt real-type-suffixopt
decimal-digits exponent-part real-type-suffixopt
decimal-digits real-type-suffix

exponent-part:
e signopt decimal-digits
E signopt decimal-digits

sign: one of
+ -

real-type-suffix: one of
F f D d M m

If no real type suffix is specified, the type of the real literal is double. Otherwise, the real type suffix determines the type of the real literal, as follows:

A real literal suffixed by F or f is of type float. For example, the literals 1f, 1.5f, 1e10f, and 123.456F are all of type float.

A real literal suffixed by D or d is of type double. For example, the literals 1d, 1.5d, 1e10d, and 123.456D are all of type double.

A real literal suffixed by M or m is of type decimal. For example, the literals 1m, 1.5m, 1e10m, and 123.456M are all of type decimal.

If the specified literal cannot be represented in the indicated type, then a compile-time error occurs.

The value of a real literal is determined by using the IEEE "round to nearest" mode.

Character literals

A character literal represents a single character, and usually consists of a character in quotes, as in 'a'.

character-literal:
' character '

character:
single-character
simple-escape-sequence
hexadecimal-escape-sequence
unicode-escape-sequence

single-character:
Any character except ' (U+0027), (U+005C), and new-line-character

simple-escape-sequence: one of
' ' 0 a b f n r t v

hexadecimal-escape-sequence:
x hex-digit hex-digitopt hex-digitopt hex-digitopt

A character that follows a backslash character () in a character must be one of the following characters: ', ', , 0, a, b, f, n, r, t, u, U, x, v. Otherwise, a compile-time error occurs.

A simple escape sequence represents a Unicode character encoding, as described in the table below.

Escape sequence

Character name

Unicode encoding

'

Single quote

0x0027

'

Double quote

0x0022

Backslash

0x005C

0

Null

0x0000

a

Alert

0x0007

b

Backspace

0x0008

f

Form feed

0x000C

n

New line

0x000A

r

Carriage return

0x000D

t

Horizontal tab

0x0009

v

Vertical tab

0x000B

A hexadecimal escape sequence represents a single Unicode character, with the value formed by the hexadecimal number following "x".

If the value represented by a character literal is greater than U+FFFF, an error occurs.

A Unicode character escape sequence (2.4.1) in a character literal must be in the range U+0000 to U+FFFF. Unicode characters in the range U+10000 to U+10FFFF are only permitted in string literals and are encoded as two Unicode "surrogate" characters.

The type of a character-literal is char.

String literals

C# supports two forms of string literals: regular string literals and verbatim string literals.

A regular string literal consists of zero or more characters enclosed in double quotes, as in 'hello', and may include both simple escape sequences (such as t for the tab character), hexadecimal escape sequences, and Unicode escape sequences.

A verbatim string literal consists of an @ character followed by a double-quote character, zero or more characters, and a closing double-quote character. A simple example is @'hello'. In a verbatim string literal, the characters between the delimiters are interpreted verbatim, the only exception being a quote-escape-sequence. In particular, simple escape sequences, hexadecimal escape sequences, and Unicode character escape sequences are not processed in verbatim string literals. A verbatim string literal may span multiple lines.

string-literal:
regular-string-literal
verbatim-string-literal

regular-string-literal:
' regular-string-literal-charactersopt '

regular-string-literal-characters:
regular-string-literal-character
regular-string-literal-characters regular-string-literal-character

regular-string-literal-character:
single-regular-string-literal-character
simple-escape-sequence
hexadecimal-escape-sequence
unicode-escape-sequence

single-regular-string-literal-character:
Any character except ' (U+0022), (U+005C), and new-line-character

verbatim-string-literal:
@' verbatim -string-literal-charactersopt '

verbatim-string-literal-characters:
verbatim-string-literal-character
verbatim-string-literal-characters verbatim-string-literal-character

verbatim-string-literal-character:
single-verbatim-string-literal-character
quote-escape-sequence

single-verbatim-string-literal-character:
any character except '

quote-escape-sequence:
''

A character that follows a backslash character () in a regular-string-literal-character must be one of the following characters: ', ', , 0, a, b, f, n, r, t, u, U, x, v. Otherwise, a compile-time error occurs.

The example

string a = 'hello, world'; // hello, world
string b = @'hello, world'; // hello, world

string c = 'hello t world'; // hello world
string d = @'hello t world'; // hello t world

string e = 'Joe said 'Hello' to me'; // Joe said 'Hello' to me
string f = @'Joe said ''Hello'' to me'; // Joe said 'Hello' to me

string g = 'serversharefile.txt'; // serversharefile.txt
string h = @'serversharefile.txt'; // serversharefile.txt

string i = 'onentwonthree';
string j = @'one
two
three';

shows a variety of string literals. The last string literal, j, is a verbatim string literal that spans multiple lines. The characters between the quotation marks, including white space such as newline characters, are preserved verbatim.

Since a hexadecimal escape sequence can have a variable number of hex digits, the string literal 'x123' contains a single character with hex value 123. To have a string containing the two characters with hex values 12 and 3, respectively, one could write 'x00123' or 'x12' + '3' instead.

The type of a string-literal is string.

Each string literal does not necessarily result in a new string instance. When two or more string literals that are equivalent according to the string equality operator (7.9.7) appear in the same assembly, these string literals refer to the same string instance. For instance, the output of the program

class Test

}

is True because the two literals refer to the same string instance.

The null literal

null-literal:
null

The type of a null-literal is the null type.

Operators and punctuators

There are several kinds of operators and punctuators. Operators are used in expressions to describe operations involving one or more operands. For example, the expression a + b uses the + operator to add the two operands a and b. Punctuators are for grouping and separating.

operator-or-punctuator: one of
[ ] ( ) . , : ;
+ - * / % & | ^ ! ~
= < > ? ++ -- && || << >>
== != <= >= += -= *= /= %= &=



Politica de confidentialitate | Termeni si conditii de utilizare



DISTRIBUIE DOCUMENTUL

Comentarii


Vizualizari: 1089
Importanta: rank

Comenteaza documentul:

Te rugam sa te autentifici sau sa iti faci cont pentru a putea comenta

Creaza cont nou

Termeni si conditii de utilizare | Contact
© SCRIGROUP 2024 . All rights reserved