CATEGORII DOCUMENTE |
Asp | Autocad | C | Dot net | Excel | Fox pro | Html | Java |
Linux | Mathcad | Photoshop | Php | Sql | Visual studio | Windows | Xml |
C is widely used for character and string handling applications. This is odd, in some ways, because the language doesn't really have any built-in string handling features. If you're used to languages that know about string handling, you will almost certainly find C tedious to begin with.
The standard library contains lots of functions to help with string processing but the fact remains that it still feels like hard work. To compare two strings you have to call a function instead of using an equality operator. There is a bright side to this, though. It means that the language isn't burdened by having to support string processing directly, which helps to keep it small and less cluttered. What's more, once you get your string handling programs working in C, they do tend to run very quickly.
Character handling in C is done by declaring arrays (or allocating them
dynamically) and moving characters in and out of them 'by hand'. Here is an
example of a program which reads text a line at a time from its standard input.
If the line consists of the string of characters stop
, it stops; otherwise it prints the length of the
line. It uses a technique which is invariably used in C programs; it reads the
characters into an array and indicates the end of them with an extra character
whose value is explicitly 0 (zero). It uses the library strcmp
function to compare two strings.
Example 5.6
Once more, the example illustrates some interesting methods used widely in C programs. By far the most important is the way that strings are represented and manipulated.
Here is a possible implementation of strcmp
,
which compares two strings for equality and returns zero if they are the same.
The library function actually does a bit more than that, but the added
complication can be ignored for the moment. Notice the use of const
in the argument declarations. This
shows that the function will not modify the contents of the strings, but just
inspects them. The definitions of the standard library functions make extensive
use of this technique.
Example 5.7
Every C programmer 'knows' what a string is. It is an array of char
variables, with the last character
in the string followed by a null. 'But I thought a string was something in
double quote marks', you cry. You are right, too. In C, a
sequence like this
is really a character array. It's the only example in C where you can declare something at the point of its use.
Be warned: in Old C, strings were stored just like any other
character array, and were modifiable. Now, the Standard
states that although they are are arrays of char
,
(not const char
), attempting
to modify them results in undefined behaviour.
Whenever a string in quotes is seen, it has two effects: it provides a declaration and a substitute for a name. It makes a hidden declaration of a char array, whose contents are initialized to the character values in the string, followed by a character whose integer value is zero. The array has no name. So, apart from the name being present, we have a situation like this:
char secret[9];an array of characters, terminated by zero, with character values in it. But when it's declared using the string notation, it hasn't got a name. How can we use it?
Whenever C sees a quoted string, the presence of the string itself serves as the name of the hidden array-not only is the string an implicit sort of declaration, it is as if an array name had been given. Now, we all remember that the name of an array is equivalent to giving the address of its first element, so what is the type of this?
'a string'It's a pointer of course: a pointer to the first element of the hidden
unnamed array, which is of type char
,
so the pointer is of type 'pointer to char
'. The situation is shown in
Figure 5.7.
Figure 5.7. Effect of using a string
For proof of that, look at the following program:
#include <stdio.h>Example 5.8
The first loop sets a pointer to the start of the array, then
walks along until it finds the zero at the end. The second one 'knows' about
the length of the string and is less useful as a result. Notice how the first
one is independent of the length-that is a most important point to remember.
It's the way that strings are handled in C almost without exception; it's
certainly the format that all of the library string manipulation functions
expect. The zero at the end allows string processing routines to find out that
they have reached the end of the string-look back now to the example function str_eq
. The function takes two character
pointers as arguments (so a string would be acceptable as one or both
arguments). It compares them for equality by checking that the strings are
character-for-character the same. If they are the same at any point, then it
checks to make sure it hasn't reached the end of them both with if(
*s1 == 0)
: if it has, then it returns 0
to show that they were equal. The test could just as easily have been on *s2
, it wouldn't have made any
difference. Otherwise a difference has been detected, so it returns 1 to
indicate failure.
In the example, strcmp
is
called with two arguments which look quite different. One is a character array,
the other is a string. In fact they're the same thing-a character array
terminated by zero (the program is careful to put a zero in the first 'empty'
element of in_line
), and a
string in quotes-which is a character array terminated by a zero. Their use as
arguments to strcmp results in character pointers being passed, for the reasons
explained to the point of tedium above.
We said that we'd eventually revisit expressions like
(*p)++;and now it's time. Pointers are used so often to
walk down arrays that it just seems natural to use the
and
operators on them. Here we write zeros into an array:
Example 5.9
The pointer ip
is set to
the start of the array. While it remains inside the array, the place that it
points to has zero written into it, then the increment takes effect and the
pointer is stepped one element along the array. The postfix form of
is particularly useful here.
This is very common stuff indeed. In most programs you'll find pointers and
increment operators used together like that, not just once or twice, but on
almost every line (or so it seems while you find them difficult). What is
happening, and what combinations can we get? Well, the
means indirection, and
or
mean increment; either pre- or post-increment. The combinations can be pre- or
post-increment of either the pointer or the thing it points to, depending on
where the brackets are put. Table 5.1 gives a list.
|
pre-increment thing pointed to |
|
post-increment thing pointed to |
|
access via pointer, post-increment pointer |
|
access via pointer which has already been incremented |
Table 5.1. Pointer notation
Read it carefully; make sure that you understand the combinations.
The expressions in the list above can usually be understood after a bit of
head-scratching. Now, given that the precedence of
,
and
is the same in all
three cases and that they associate right to left, can you work out what
happens if the brackets are removed? Nasty, isn't it? Table 5.2
shows that there's only one case where the brackets have to be there.
With parentheses |
Without, if possible |
|
|
|
|
|
|
|
|
Table 5.2. More pointer notation
The usual reaction to that horrible sight is to decide that you don't care that the parentheses can be removed; you will always use them in your code. That's all very well but the problem is that most C programmers have learnt the important precedence rules (or at least learnt the table above) and they very rarely put the parentheses in. Like them, we don't-so if you want to be able to read the rest of the examples, you had better learn to read those expressions with or without parentheses. It'll be worth the effort in the end.
In certain cases it's essential to be able to convert pointers from one type to another. This is always done with the aid of casts, in expressions like the one below:
(type *) expressionThe expression is converted into 'pointer to type', regardless of the expression's previous type. This is only supposed to be done if you're sure that you know what you're trying to do. It is not a good idea to do much of it until you have got plenty of experience. Furthermore, do not assume that the cast simply suppresses diagnostics of the 'mismatched pointer' sort from your compiler. On several architectures it is necessary to calculate new values when pointer types are changed.
There are also some occasions when you will want to use a 'generic' pointer.
The most common example is the malloc
library function, which is used to allocate storage for objects that haven't
been declared. It is used by telling it how much storage is wanted-enough for a
float
, or an array of int
, or whatever. It passes back a
pointer to enough storage, which it allocates in its own mysterious way from a
pool of free storage (the way that it does this is its own business). That
pointer is then cast into the right type-for example if a float
needs 4 bytes of free store, this
is the flavour of what you would write:
Malloc
finds 4 bytes of
store, then the address of that piece of storage is
cast into pointer-to-float and assigned to the pointer.
What type should malloc
be declared to have? The type must be able to represent
every known value of every type of pointer; there is no guarantee that any of
the basic types in C can hold such a value.
The solution is to use the void *
type that we've already talked about. Here is the last example with a
declaration of malloc
:
The rules for assignment of pointers show that there is no need to use a
cast on the return value from malloc
,
but it is often done in practice.
Obviously there needs to be a way to find out what value the
argument to malloc
should
be: it will be different on different machines, so you can't just use a
constant like 4. That is what the sizeof
operator is for.
Politica de confidentialitate | Termeni si conditii de utilizare |
Vizualizari: 781
Importanta:
Termeni si conditii de utilizare | Contact
© SCRIGROUP 2024 . All rights reserved