Scrigroup - Documente si articole

     

HomeDocumenteUploadResurseAlte limbi doc
AccessAdobe photoshopAlgoritmiAutocadBaze de dateCC sharp
CalculatoareCorel drawDot netExcelFox proFrontpageHardware
HtmlInternetJavaLinuxMatlabMs dosPascal
PhpPower pointRetele calculatoareSqlTutorialsWebdesignWindows
WordXml

AspAutocadCDot netExcelFox proHtmlJava
LinuxMathcadPhotoshopPhpSqlVisual studioWindowsXml

The CLR as a Better COM

dot net



+ Font mai mare | - Font mai mic



The CLR as a Better COM

My first book, Essential COM (Addison-Wesley 1998), began by describing the problems that plagued the pre-COM world and then proceeded to describe an architecture for solving those problems. Because the CLR is the heir apparent to COM, it seems fitting that this book begin with a similar treatment, this time looking at the world that COM has wrought.



COM Revisited

Component technologies focus on the contracts between independently developed and deployed programs. The Component Object Model (COM) was Microsoft's first attempt at formalizing these contracts both as a design paradigm and through supporting platform technology. The design paradigm of COM was that component contracts are expressed as type definitions. This was a step forward from the world COM replaced, in which contracts were expressed only as simple functional entry points. In this respect, COM was a major advance because it brought the dynamic loading of code and the type system together in a fairly self-consistent manner.

The COM programming model itself has stood the test of time extremely well. COM combined existing ideas such as encapsulation, polymorphism, and separation of interface from implementation into a unified programming discipline that has left an indelible mark on the field of software engineering. Rather than rehash that model here, I will point you to Chapter 1 of either Essential COM or Design Patterns (Erich Gamma et al., Addison-Wesley 1995) for two very different descriptions of essentially the same programming model.

Remember, however, that COM is both a programming model and a supporting platform technology. On that latter front, COM has not held up nearly as well as the programming model I know and love. Unfortunately, a solid platform technology is needed to make COM more than just an idea or programming discipline. For that reason, the COM era is coming to a close.

Most if not all of the problems with the COM platform can be traced back to the nature of contracts between components. In an ideal world, the contracts between components would be expressed purely in terms of the semantic guarantees and assumptions that exist between the consumer and the component. Unfortunately, the field of software engineering has yet to define a way to express semantics that has been proven commercially (or technically) viable for large-scale industry-wide deployment. The closest we have come as a profession is to use programmatic type definitions along with human-readable documentation that describes the semantics of those types. This is how it was done before COM. This is how it will be done long after the last COM component on earth is finally wiped out of existence.

COM expressed component contracts in terms of types; however, a component contract in COM had two key problems that made COM-based contracts sub-optimal for expressing semantics. One of these problems was related to the description of a COM contract. The other problem was related to the contract itself.

The first problem with COM relates to how contracts are described. The COM specification bent over backwards to avoid mandating an interchange format for contract definitions. This meant that if one adhered to only the COM specification, there was no standardized way to describe a contract; rather, the COM specification assumed that the type definitions of a contract would be communicated via some out-of-band technique that was outside the scope of COM proper. Of course, this is viable only in the world of specifications. To make COM a useful technology, a concrete solution was needed; otherwise, it would have been impossible to build compilers, tools, and supporting infrastructure.

Microsoft defined and supported not one but two interchange formats for COM contract descriptions: Interface Definition Language (IDL) and type library (TLB) files. That in itself would not have been a problem; however, the two formats were not isomorphic. That is, there were constructs that could be expressed in one format that had no meaningful representation in the other format. Worse, the constructs supported by one format were not a proper subset of the constructs supported by the other, so it was impossible to view either of these formats as the 'authoritative' or 'normative' format for contract descriptions.

An argument could have been made for simply defining a third format based on the union of constructs supported by both formats. However, there were at least two other critical problems with the way contracts were described in COM. For one, COM made no attempt to describe component dependencies. At the time of this writing, there is no way to walk up to a COM component (or its contract definition) and determine which other components are required to make this component work. The lack of dependency information made it difficult to determine which DLLs would be needed to deploy a COM-based application. This also made it impossible to statically determine which versions of a component were needed, which made diagnosing versioning problems extremely difficult.

The second major problem and the ultimate death knell for COM's contract description format was its lack of extensibility. In the early 1990s, the Microsoft Transaction Server (MTS) team was working on a new programming model based on the ideas now known as aspect-oriented programming (AOP). AOP takes aspects of the code that are not domain-specific and hoists them out of the developer's source code. AOP-based systems rely on alternative mechanisms for declaring these aspects to make the intention of the programmer explicit rather than implicit.

The MTS team wanted to allow developers to express their requirements for concurrency, transactioning, and security as aspects rather than as calls to API functions. Because of the broad adoption of COM, the MTS developers used augmentations to COM contract descriptions as the mechanism for expressing these aspects. Developers using MTS simply annotate their COM class, interface, and method definitions with attributes that inform the MTS executive of the requirements and assumptions of the underlying code. To make these attributes useful, the MTS executive replaced the COM loader and injected interceptors based on the aspects of the class being loaded. The MTS interceptor (called a context wrapper) would do whatever work was necessary to ensure that the developer's assumptions were met prior to dispatching the method call. As a point of interest, this model of using declarative attributes and interception was later used as the basis for Enterprise Java Beans, an homage to MTS from Sun Microsystems.

Unfortunately, the MTS team couldn't rely on either of COM's contract formats as a reliable way to convey and store attributes. One of the contract formats, Interface Definition Language (IDL), was a text-based format that was rarely deployed with the component itself. Moreover, IDL was typically used only by C++ developers, which meant that IDL-based contract definitions were largely useless given the relatively small number of C++ developers building the enterprise systems for which MTS was designed. The second format, type library (TLB) files, had very rudimentary (and buggy) extensibility hooks. The ultimate downfall of TLBs, however, was the fact that the mainstream developer using Visual Basic had no way to directly influence the TLB. Rather, the Visual Basic IDE and compiler insulated the developer from TLB generation, making it impossible for VB programmers to specify MTS attributes during the development process. Although VB 5.0 finally added support for one of the MTS attributes (Transaction), the MTS team was beholden to the VB team to make its technology available to the masses. Understandably, the VB team had its own agenda, and that caused the MTS and COM teams to abandon TLBs and IDL once and for all and define a new contract definition format that would be extensible in a cleaner, more accessible way than TLBs. That new contract format is the focus of most of the remainder of this book.

The previous discussion focused on the problems with how component contracts are described. Even if a perfect unified description format were to emerge, COM will still have a fundamental problem with the way contracts work. That problem has nothing to do with the way the contract is described. Rather, the problem is rooted deeply in the contract itself.

A component contract in COM is based on type descriptions. The type system used in these contracts is based on a subset of C++ that is guaranteed to be portable across compilers. This portability guarantee is not just in terms of the lexical programming language. Rather, the portability guarantee is in terms of the data representations used by most modern compilers. And therein lies the problem.

A component contract in COM is a physical (also known as binary) contract. That is, a COM component has hard requirements on how intercomponent invocations must work. A COM contract mandates precise vtable offsets for every method. A COM contract mandates the exact stack discipline (e.g., __stdcall) to use during method invocation. A COM contract mandates the exact offset of every data structure that is passed as a method parameter. A COM contract mandates exactly which memory allocator to use for callee-allocated memory. A COM contract mandates the exact format of an object reference (called an interface pointer), including the exact format of the vptr and vtbl to be used. As far as the underlying technology of COM is concerned, a component contract is ultimately just a protocol for forming stack frames in memory, utterly free of semantic content.

The physical nature of a component contract in COM has its downsides. For one, a considerable amount of attention to detail is needed to make sure things work properly. This made COM a difficult technology to use even for developers with above-average attention spans, let alone casual programmers. Attempts by tool developers to hide this complexity have only compounded the problem, as any VB programmer who has dealt with VB's binary compatibility mode can attest.

The physical nature of a COM component contract is especially problematic in the face of component versioning. Versioning is hard enough when only semantic changes must be accounted for. When minute details such as vtable ordering or field alignment cause runtime errors, it only makes the problem worse. Granted, the precision of a contract definition in COM allowed for extremely efficient code to be generated; however, the brittleness exhibited by this code was arguably an unacceptable trade-off.

The Common Language Runtime

To address the problems with COM contracts and their definitions, the COM and MTS teams at Microsoft set out to develop a new component platform called COM3. Soon after that name was chosen, various parties within Microsoft discovered that COM3 was not a legal directory name under certain Microsoft platforms, so they quickly changed the name to the Component Object Runtime (COR). Other names used during the development cycle included the COM+ Runtime, Lightning, and the Universal Runtime (URT), and then finally, just prior to its first public beta, the technology was renamed to the Common Language Runtime (CLR).

It is difficult to talk about the CLR without discussing the difference between a specification and an implementation. As part of the .NET initiative, Microsoft has submitted large parts of the platform to various standards organizations. In particular, Microsoft has submitted the Common Language Infrastructure (CLI) to the ECMA (https://www.ecma.org). The CLI includes the common type system (CTS), the Common Intermediate Language (CIL), and the underlying file and metadata formats. However, the CLR itself is not part of the ECMA submission. Rather, the CLR is an implementation of the CLI that is owned and controlled exclusively by Microsoft. In general, this book will not distinguish between the CLI specification and the CLR, as, at the time of this writing, no other implementations of the CLI were widely available.

Like the COM platform it replaces, the CLR focuses on the contracts between components. As with COM, these contracts are based on type. However, that is all the two contracts have in common.

Unlike COM, the CLR begins its life on Earth with a fully specified format for describing component contracts. This format is referred to generically as metadata. CLR metadata is machine-readable, and its format is fully specified. Additionally, the CLR provides facilities that let programs read and write metadata without knowledge of the underlying file format. CLR metadata is cleanly and easily extensible via custom attributes, which are themselves strongly typed. CLR metadata also contains component dependency and version information, allowing the use of a new range of techniques to handle component versioning. Finally, the presence of CLR metadata is mandatory; you cannot deploy or load a component without having access to its metadata, something that makes building CLR-based infrastructure and tools considerably easier than in environments (e.g., COM) where metadata is optional.

The second way that CLR contracts differ from COM contracts is in the very nature of the contract itself. In COM, a component contract implies a precise in-memory representation of a stack frame, a vtable, and any data structures that are passed as method parameters. In this respect, the CLR and COM could not be more different.

Contracts in the CLR describe the logical structure of types. Contracts in the CLR specifically do not describe the in-memory representation of anything. The CLR postpones the decisions regarding in-memory representations until the type is first loaded at runtime. This virtualization of contracts greatly reduces the brittleness of COM's binary contracts because no in-memory representations are assumed between components.

Because a CLR type definition is logical rather than physical, the precise code sequence for accessing a field or method is not baked into the contract. This gives the CLR a great deal of flexibility with respect to virtual method table layout, stack discipline, alignment, and parameter passing conventions, all of which could change between versions of the CLR without the need to recompile components. By referring to fields and methods by name and signature rather than by their offsets, the CLR avoids the order-of-declaration problems that plague COM. The actual address/offset of a member cannot be determined until the type is loaded and initialized at runtime.

The virtualization of data representations and method addressing has one significant requirement. Because the exact physical aspects of a contract (e.g., method table/field offsets) are not known when the consumer of the component is compiled, some mechanism is needed to defer the resolution of these offsets until the code is actually deployed against the final versions of the components on a particular processor architecture. To make this possible, components written for the CLR rarely contain machine code. Rather, CLR-based components use Common Intermediate Language (CIL) for their method implementations.

It is easy to dismiss CIL as a processor-neutral instruction set. However, even if only one processor architecture were ever anticipated, CIL is important because of its ability to abstract away the physical data representation issues inherent in native machine code. To this end, the opcodes used by CIL to access fields and invoke methods do not use absolute offsets or addresses. Rather, those CIL instructions contain references to the metadata for the field or method they operate on. These references are based solely on the name and signature of the field or method and not on its location or offset. As long as the target component has a field or method that matches the name and signature, the physical offsets chosen by the CLR are immaterial.

It is important to note that the CLR never executes CIL directly. Rather, CIL is always translated into native machine code prior to its execution. This translation can be done either when the component is loaded into memory or preemptively when the component is installed on the deployment machine. In either case, when the CIL-to-native translation is done, the actual in-memory representations of any data types or method tables are used to generate the native machine code, resulting in efficient code with relatively little indirection.

The native code produced by the CLR yields the same high-performance physical coupling that is used in C++ and COM. However, unlike C++ and COM, which calculate this physical coupling at development time, the CLR does not resolve the details of the physical binding until CIL-to-native translation takes place. Because this translation is done on the deployment machine, the type definitions that are needed from external components will match the ones found on the deployment machine and not those on the developer's machine. This greatly reduces the brittleness of cross-component contracts without compromising performance.

Finally, because the CIL-to-native translation occurs on the deployment machine, any processor-specific layout or alignment rules that are used will match the processor architecture that the code will execute on. This is especially important at a time when the industry faces another processor shift as the installed base moves from the existing IA-32/Pentium architecture to the IA-64/Itanium architecture.

The Evolution of the Programming Model

The nature of CLR contracts naturally lends itself to a programming model that is independent of task or programming language. The programming model implied by the CLR is a refinement of the COM programming model and is very type-centric because every entity your program can deal with is affiliated with a type. This applies to objects, values, strings, primitives, and arrays. The type-centricity of the programming model is a necessity because one of the key services provided by the CLR is that code can be verifiably type-safe. This prevents malicious code from hijacking an object reference and invoking methods that are not part of the object's contract.

Although developers can recompile existing C++ programs for the CLR, most new programs written for the CLR are written at a higher level of abstraction. The CLR encourages a worldview in which everything is a type, an object, or a value. To this end, the CLR provides a range of services that collectively are called managed execution. Under managed execution, the CLR is omniscient and has complete information about all aspects of a running program. This includes knowledge of the state and liveness of local variables in a method. This includes knowledge of where the code for each stack frame originated. This includes knowledge of all extant objects and object references, including reachability information.

Programmers who target the CLR are encouraged to abandon the unmanaged programming style of the past. In particular, programmers are encouraged to give up the explicit management of memory and instead allocate and use instances of types. Similarly, programmers are encouraged to give up manual thread management and instead use the CLR's facilities for concurrent method execution.

For those readers who naturally resist giving up control to a body of software written by someone else, it is important to look back at the move from the DOS-based platform to Windows NT. Initially, there were developers who felt that moving from physical memory and interrupts to virtual memory and threads would either be too slow or too limiting to be practical for all but the most casual programmer. Many of those same arguments will be leveled against the CLR. Only time will tell whether the CLR is 'too much abstraction'; however, it is the author's belief that the shift to managed execution environments such as the CLR or the Java virtual machine (JVM) is a step forward, not backward. As always, when programmers are faced with a choice between productivity and control, technologies that make them more productive tend to win out over time. Figure 1.1 shows the relationship between these two factors.

Figure 1.1. The Move toward Managed Execution

Another key aspect of the new programming model is the heavy reliance on metadata. Independent of the fact that metadata is required to translate CIL to native code, metadata is also made accessible to any program running inside or outside the CLR. The ability to reflect against metadata enables programming techniques in which programs are generated by other programs, not humans. Generative programming is a fairly new discipline. However, most applications of generative programs typically take one program's metadata as input and emit another program as output. The easy accessibility of the input program's metadata makes building generative architectures considerably easier.

The CLR also supports generative programming via a facility called the CodeDOM. The CodeDOM allows C#, VB.NET, and JavaScript programs to be constructed in memory as a strongly typed object model rather than as a text stream. The CodeDOM enables generative programs that emit source code to postpone the decision as to output language until the last minute. Additionally, the CodeDOM supports in-memory compilation of code, and that allows new generative technologies (including compilers for new programming languages) to deal in a familiar high-level programming language rather than the low-level aspects of CIL and metadata attributes.

The CLR is a platform for loading and executing code. However, it is difficult to discuss the CLR (or its implied programming model) without addressing programming languages. In general, the CLR supports any programming language that has a compiler that emits CLR metadata and CIL. Programming languages are like flavors of ice cream in that what attracts a person to one language may repulse another person's esthetic sensibilities. To that end, this book will avoid language-specific discussions whenever possible. Unfortunately, some programming language must be used to demonstrate various facets and features of the CLR.

Examples in this book typically use C#. C# is used simply because it is the de facto standard language for the platform, as is evidenced by numerous support tools, documentation, and SDK samples. Note that although C# has its own unique syntax that is largely derived from C, C++, and Java, C# is ultimately just another programming language that imposes its own set of conventions and constructs over the underlying CLR. Table 1.1 shows the trade-offs between the five programming languages explicitly supported by Microsoft in Version 1.0 of the .NET Framework software development kit (SDK).

Table 1.1. .NET Language Features

Feature

VB.NET

JScript

C#

C++

ILASM

Compiler

VBC.EXE

JSC.EXE

CSC.EXE

CL.EXE

ILASM.EXE

CodeDOM support

Yes

Yes

Yes

No

No

Dynamic member addition

No

Yes

No

No

No

Late binding

Automatic

Automatic

Manual

Manual

Manual

User-defined value types

Yes

No

Yes

Yes

Yes

Case-sensitive

No

Yes

Yes

Yes

Yes

Unsigned integral types

No

Yes

Yes

Yes

Yes

Method overloading

Yes

Yes

Yes

Yes

Yes

Operator overloading

No

No

Yes

Yes

N/A

C-style pointers

No

No

Yes

Yes

Yes

Native/unmanaged methods

No

No

No

Yes

No

Code verification

Always

Always

Optional

Never

Optional

Opaque/unmanaged types

No

No

No

Yes

Yes

Templates/generics

No

No

No

Yes

No

Multiple inheritance

No

No

No

Yes

No

Where Are We?

The CLR is an evolutionary step in component software. Like its predecessor, COM, the CLR supports the integration of components based on strongly typed contracts. Unlike COM, however, these contracts are based on logical structure and imply no underlying physical data representation. This virtualization gets us one step closer to the Holy Grail of purely semantic contracts. Component contracts are described by CLR metadata, an extensible, machine-readable interchange format that is ubiquitous in CLR-based programs and architectures.



Politica de confidentialitate | Termeni si conditii de utilizare



DISTRIBUIE DOCUMENTUL

Comentarii


Vizualizari: 1493
Importanta: rank

Comenteaza documentul:

Te rugam sa te autentifici sau sa iti faci cont pentru a putea comenta

Creaza cont nou

Termeni si conditii de utilizare | Contact
© SCRIGROUP 2024 . All rights reserved