CATEGORII DOCUMENTE |
Bulgara | Ceha slovaca | Croata | Engleza | Estona | Finlandeza | Franceza |
Germana | Italiana | Letona | Lituaniana | Maghiara | Olandeza | Poloneza |
Sarba | Slovena | Spaniola | Suedeza | Turca | Ucraineana |
Indexes have been part of the Xbase arena since the beginning. As newer versions of the various products were introduced, the indexing capabilities were expanded. Visual FoxPro gives us all of the original index types and then adds some new things. In this chapter we will explore the area of indexes and the benefits they provide. We will also explore the area of relational keys. Well see the differences between relational keys and indexes and how indexes can help us in implementing our relational designs.
The term key is used in many different contexts in our work; there are key expressions for indexes, primary keys, foreign keys and candidate keys. Lets catalog all these differing meanings for the term key and expand our understanding of the whole key mess.
One simplified definition for a relational database is this: 'A database that is comprised of multiple tables, which each store information about a single person, place, thing or concept. These tables can be related to each other, through the use of shared values, to produce information that none of the tables can provide alone.' The shared values, referred to in that definition, are relational keys. There are two types of relational keys: primary and foreign. See Figure 1.
Figure 1. A diagram showing primary and foreign keys
and the relation derived with them.
Figure 1 shows a customer and invoice table. You can see that the customer table has CustID as its primary key and the invoice table has InvNo as its primary key. In the invoice table you see a field named CustIDthis is a foreign key connecting an invoice to a customer record. The line shows that relationship.
One absolute rule for relational databases is that every table must have a primary key. If a table does not include any one field or combination of more than one field that can perform the role of primary key, you must add another field to that table to perform that role. Failing to do this will render the table unstable in terms of data integrity. This means that eventually youll have data problems that can be directly attributed to the lack of a primary key.
There can be more than one candidate for the primary key in a table design. For example, consider an employee table that has Employee Number and Social Security Account fields. Either of these fields could be designated as the primary key. They are both unique for each employee, and they are both required for all employees, so either one will do. These two fields would be referred to as candidate keys because they are both candidates for being the primary key. We need to select one of them to perform the role of primary key; when we do, the other one will continue to be called a candidate key.
There are cases in which use of a Social Security Number as a primary key can cause problems. Social security numbers are supposed to be unique, but as with all other absolutes, it isnt so. There have been numerous cases of duplicate social security numbers being issued, and not everyone has one (such as infants and non-citizens). Any system using a social security number as a primary or candidate key should take this possibility into account.
Visual FoxPro has index types named primary and candidate. Are these the same as what we just discussed? Not exactly. The relational terms of primary and candidate key are referring to the roles that certain fields will play in terms of your data design. The Visual FoxPro index types are implementation tools that you can use after the roles have been determined.
You can have a primary key in a table without using a primary index (you could use a regular index instead) and you can create a candidate index on a field that isnt a candidate for the primary key. So you can see that these arent exactly the same things. Well discuss the indexes a little later in this chapter.
According to our earlier definition of 'primary key,' it can be comprised of more than one field. When a primary key is comprised of more than one field it is called a compound primary key. Compound primary keys are perfectly valid within the realm of relational design. However, they present a set of complexities that you should be aware of before you decide to use them.
The first issue with compound primary keys is that theyre more complex than a single field primary key. Applying the KISS principle (Keep It Simple, Stupid) dictates that you select the more simple primary key to use. Youll be able to manage these primary keys much easier if they are single fields.
Another issue isnt directly related to the complexity of the compound primary key. One of the things you must manage related to primary keys is a change in value. Whenever this occurs you must deal with all of those records in other tables that are pointing to the primary key that changed. This is called managing referential integrity, which will be covered in detail in a later chapter. So what does this have to do with compound primary keys?
A compound primary key is comprised of more than one field, and the fields comprising the compound primary key have meaning to the user outside their roles as primary keys. If a field has meaning to a person, that person will eventually want to change that field. When they do change a field that is part of the primary key, you must deal with the fallout in the referential integrity.
What do you do if the only possible primary key is a compound one?
Enter the surrogate primary key. A surrogate key is a field added to the tables structure that has no meaning or purpose other than being the primary key. These surrogate keys can be completely hidden from the users. Because the users never see the surrogate key and it has no meaning other than its role as primary key, its highly unlikely that its value will be changed. Because its value wont change, you dont have to deal with the referential integrity issue related to a change in the value.
The appendix of this book contains a section on data normalization. Data normalization is a process through which you validate your table designs and eliminate problems. There are six steps in the normalization process; however, the last two apply only when there is a compound primary key. That means that if you use surrogate primary keys (which are always single fields) you eliminate the need to apply the last two normalization steps, thus simplifying your work. Hey, were definitely in favor of anything that makes our workload lighter.
Also, because surrogate keys are meaningless, theres no need for the user to provide the key value. You can generate surrogate keys programmatically without any difficulty.
When you use surrogate primary keys, theres usually one or more other candidates in the table structure. For example, consider a customer table. The user will require some kind of customer identification for printing on invoices and so forth. You might name this field CustomerNumber. Lets assume that the user assigns alphanumeric values to the customer number (for example, ACME Corporation is assigned ACME01). This is helpful to the user because the customer number is indicative of the customers name. For the surrogate primary key, you add a field named CustomerID, which is an integer field that holds a unique number. All of the related tables store your CustomerID value for referencing the customer record.
Lets say that one day, ACME gets bought out by Zenith and suddenly the customer number of ACME01 is no longer indicative of the customers name. The user wants to change the customer number to ZENI01. Because youre using a surrogate primary key, you can allow the user to change the customer number and not be concerned with the change. Why? Because that fields value does not appear anywhere else in the system. All of the historic related data will remain connected to the customers record because your surrogate key has not changed its value.
Using surrogate primary keys is a good thing! Okay, so how do you implement them? How do you generate new keys for new records?
Implementing surrogate primary keys
Our recommendation for surrogate primary keys is to use the integer data type. We suggest this data type for two reasons: first, integers can store more than 4 billion unique values in just four bytes if you allow negative values. That number is reduced to just over 2 billion if you use only positive values. Two billion is a pretty large number and four bytes isnt very much space at all.
The second reason for using integers for the surrogate keys is that testing indicates that integers are handled faster by Visual FoxPro than any other data type. This is probably related to the fact that integers are stored on disk exactly the same way they are used in memory, thus eliminating the need for conversion between storage formats.
We will use integer surrogate keys. How do we assign the keys during record creation? The Visual FoxPro database container provides field properties that we can use. One of those properties is the Default Value. This default value is used to populate a field when a new record is appended and no value is specified for the field. For our surrogate key field we can specify the default value as a user-defined function call, which will return the next primary key value.
You can handle the problem of generating unique values for the surrogate key by using a table that holds the next key value to be used for each table in your database. The structure for this table is described in Table 1.
Table 1. The structure for the primary key table named NextKey.dbf.
Field Name |
Data Type |
Size |
Comment |
IidKey |
Integer |
The primary key for this table |
|
CTableName |
Character |
The tables name, stored in uppercase |
|
InextKey |
Integer |
The next value for the primary key for this table |
The next issue is the code that will return the primary keys for you. The following program listing below shows one approach.
Listing 1. An example of a GetKey program.
* GetKey.prg
LPARAMETERS pcTableName
* pcTableName is the name of the table that we want a PK for
* Check for a character parameter
IF TYPE('pcTableName') <> 'C'
* Invalid parameter
RETURN 0
ENDIF
pcTableName = UPPER(pcTableName)
* Save the environment
LOCAL lcAlias, liReturn
lcAlias = ALIAS()
IF NOT USED('NextKey')
SELECT 0
USE NextKey
ELSE
SELECT NextKey
ENDIF
LOCATE FOR cTableName = pcTableName
IF NOT FOUND()
* No record for this table
APPEND BLANK
REPLACE iNextKey WITH 1, ;
cTableName WITH pcTableName
liReturn = 0
ENDIF
* Lock the record
DO WHILE NOT RLOCK()
* Short delay
INKEY(.02,'H')
ENDDO
* Get the PK
liReturn = iNextKey
* Update the iNextKey field
REPLACE iNextKey WITH iNextKey + 1
UNLOCK
IF NOT EMPTY( lcAlias )
SELECT ( lcAlias )
ELSE
SELECT 0
ENDIF
RETURN liReturn
The previous program will get the next key for a particular table. Youll notice that the locking method in this code isnt very robust. You can get away with this approach because this tables record will only be locked for a very short period of time by any station.
In a later chapter we will examine the stored procedures for a database. You might choose to place this code in the databases stored procedures. The only problem with doing so is that the stored procedures are only visible to the tables that are part of the database. This means that youd need to duplicate this code in every database you created. If you make this code a program file of its own, then all tables in all databases could use the same code. We will create this program as a .prg file and not store it in the databases stored procedures.
How do you reference this program from the default value property of the surrogate key? For each tables surrogate key field, youll put GetKey('TableName') in the default value property. Now, whenever you issue an APPEND BLANK for a table, youll get the next primary key automatically assigned to the record.
In the first part of this chapter we discussed the relational keys that youll be using in your databases. Visual FoxPro has the ability to provide indexes that can assist in managing those keys, as well as provide other benefits to your applications. This section of the chapter will present the various types of indexes that are available in Visual FoxPro and some of the pros and cons about using them.
Indexes in Visual FoxPro are files that are independent of the actual data files that provide for a sorting of the records in the data file. Compound index files (CDX) are capable of storing more than one ordering of the data in a single index file. Each ordering is called a tag and each tag has an index expression. The index expression defines the ordering of the records.
Index expressions can be a single field reference or complex expressions comprised of the concatenation of multiple fields. The expression may even contain references to variables that arent part of the table structure.
Index types defined
Visual FoxPro has four available index types: Regular, Unique, Primary and Candidate. Definitions of each index type follow.
Regular indexes
Regular indexes are the simplest type available. There is no uniqueness enforcement, allowing multiple entries in the index for the same index expression value. These are the most common indexes used.
Unique indexes
Unique indexes are similar to regular indexes, in that they dont enforce uniqueness on the expression. However, they do store only one reference in the index for each unique value of the expression. These indexes are the cause of many misunderstandings on the part of developers. You can depend on a unique index accuracy only immediately after it is created.
Lets look at an example to understand the problems. You have a customer table that stores the state in which the customer is located. Youd like to know in which states you have customers. Because you dont want to see the same state listed for every customer you have in that state, use a unique index on state to provide this listing. Create the index as shown in the following code:
INDEX ON State TAG CustState UNIQUE
Using the UNIQUE keyword in the INDEX command creates the unique index for you.
Now imagine that you have 25,000 customers in the state
of
Consider another situation. The unique index creates an
index entry for the first occurrence of a key value. If you have a
unique index on your state field, and the record for the first customer in the
table whose state field is 'NY' is deleted, this entry still appears
in the index. If SET DELETED is ON, you cant 'see' this record, and
it will appear as though you have no customers in
You can see that the unique index isnt being maintained the way you might expect it to be, but there is also no bug in the way it is being maintained. Our conclusion is that if you need a unique index, create it when youre going to use it and delete it when youre through with it. This is the only way you can be assured that the index accurately reflects the data in the table. Also make sure that you include a FOR NOT DELETED() statement in the INDEX command when you create a temporary unique index to insure that SET DELETED does not interfere with the accuracy of your results.
Candidate and primary indexes
Candidate and primary indexes have one thing in common: They restrict the values of their expressions to being unique within the tables data. If you attempt to add a record to a table that duplicates the value of a candidate or primary index expression, Visual FoxPro will generate an error indicating that the uniqueness of the index was violated.
You can use these two index types to prevent duplicate values from getting into a table and enforcing the integrity of the primary or candidate keys for that table. The difference between primary and candidate indexes is that there can be only one primary index on a table but any number of candidate indexes. Primary indexes can exist only for tables that are part of a database, while candidate indexes can be created on free tables.
Using a primary or candidate index is an absolute enforcement of uniqueness on the index expression. There is no way to get around it. This is because the design goal of these two index types is to provide the constraints for candidate and primary keysand those constraints require absolute uniqueness.
This uniqueness includes the consideration of deleted records, because in the field of relational theory, there is no concept of a deleted record. Either the record is in the table or it isnt. Visual FoxPro uses the concept of a record being in the table but marked as deleted. This can cause a problem if you arent aware of it and dont plan ahead for it.
For example, consider the customer table we mentioned earlier. It had a customer number field that was alphanumeric and was a candidate key for the table. If you create a candidate index on that field, it will enforce uniqueness. However, what happens if you delete a customer record and then try to add a new one that uses the same customer number as the deleted record? The candidate index wont allow the new record into the table because the deleted record causes a violation of the uniqueness of the candidate key.
How can you get around this so that deleted records are not considered by the candidate index? You can put a filter on the candidate index on the DELETED() function. This will eliminate the deleted records from the candidate index; thus, those deleted records wont be considered in the uniqueness of the indexs expression. Likewise, if its necessary to create a temporary unique index, and if the table is likely to have deleted records, then including FOR NOT DELETED() in the index expression will address the issue of a unique-indexed deleted record.
Filtering indexes
As stated in the previous paragraph, filtered indexes can be of value. Filtered indexes use the FOR option of the INDEX command (in the visual designers, you can use the Filter column of the designer to specify a FOR clause).
Filtering an index can be helpful in providing a rapid method of seeing only a subset of the records in a table. These indexes also can help with the candidate and primary index and deleted record problem described earlier. However, adding a filter to an index will remove the ability of Rushmore to use that index for optimization. You would need to maintain a duplicate to the filtered index, without the filter, to allow Rushmore to optimize expressions matching the filtered indexs expression.
In the earlier example of the candidate index on CustomerNumber, you would add the FOR condition of NOT DELETED() to the candidate index and create an additional regular index on CustomerNumber with no filter to be used by Rushmore.
Now that you understand various types of indexes, you need to figure out what these indexes can do for you, and how to use them to your best advantage. An index can fill some very specific roles in your system designincluding ordering the data records, managing relations between tables, optimizing performance of various data-access processes, and filtering the records you see in a table. Lets look at these roles individually.
Ordering the records in a table
The most common reason to create an index is to provide an order to the records in a table that is based on some expression. Indexes allow you to have more than one record order without needing to physically sort the table for each order.
The obvious use for ordering records is to use one or more fields in the table to provide a sequence for the records, and then simply create an index on that field or the concatenation of more than one field. What happens if you have a character field and a numeric field and you want an order on the combination of those two? Visual FoxPro has a number of data type conversion functions that you can use to accomplish this. Remember that the key for any index is an expression, and an expression can use Visual FoxPro functions and even User Defined Functions (UDF).
For example, assume you have a table of customers that lists the city and the credit limit for each customer. You want a listing of customers alphabetically by city and numerically by credit limit within the cities. You could index this table like so:
INDEX ON City + STR(CreditLimit) TAG CityCred
What if you wanted the CreditLimit in descending order within city so that the customer with the highest credit limit was first in any given city? Try this:
INDEX ON City + STR(999999999999-CreditLimit) TAG CityCred
Okay, youre saying, so this stuff works with numbers and characters, but what about characters and dates? Assume that your customer table has a field named CustomerSince, which holds the date that any given customer first did business with you. You want the city order and the CustomerSince within the city, and the dates in descending order. Heres the code:
INDEX ON City + STR( CustomerSince) TAG CitySinc
For simple date concatenations, you can use the DTOS() function to convert the date to a character string that preserves the chronological ordering for the date.
How about using UDF in an index expression? Yes, this can be done. You simply use the UDF as you would anywhere else. However, we recommend against it for the following reasons:
In rare situations where the use of a UDF in an index expression might be unavoidable, do what needs to be done. Just make sure it isnt an easy decision to use a UDF in an index expression.
Index expressions can refer to fields in tables other than the one for which the index is being made.
For example, you might want to see invoices in order by customer name and invoice date within the customer names. The Name field is in the Customer table and the Invoice Date field is in the Invoice table, meaning youd need to create the index like so:
USE Invoice
USE Customer ORDER CustID IN 0
SET RELATION TO CustID INTO Customer
INDEX ON Customer.Name + STR(-Invoice.InvDate) TAG CustDate
Again, although this type of index can be built, its a very bad idea. One reason is that the invoice table can no longer be open unless the customer table is already open and the order is set correctly in the customer table. Also, whenever the invoice table may be edited, its required that the relation be set into the customer table. Otherwise, this index will become inaccurate, referring to the wrong customer name. If a customer record is edited, whether or not the invoice table is open, this index wont be updated to reflect any change in a customers name. When you need to see data with this type of ordering, use a view or an SQL SELECT command to get the data and order it (see chapter 3).
Notice in the previous code that the alias names are used in the index expression. This is necessary because of the two-table issue. Using an alias within an index expression can cause problems; for example:
USE Invoice
INDEX ON Invoice.InvNo TAG InvNo
USE
USE Invoice ALIAS MyInvoices && errors will follow
Errors will occur here because Visual FoxPro stored the exact expression that you typed in the index header. This means the index expression in the index files header is 'Invoice.InvNo'. On the fourth line, you open the invoice table and assign it an alias of MyInvoices. What happened when Visual FoxPro tried to update the index tag? An alias not found error! Why? Because Visual FoxPro is looking for the alias Invoice and it doesnt exist. Avoid the use of aliases in index expressions. Remember that alias names are volatile and exist only at run time. It might help if you keep this in mind: Tables dont have alias names; work areas do. A work area doesnt always have an alias name that is identical to the name of the table that is open in it.
Data filtering
Using the FOR clause of the INDEX ON command allows you to filter the records included in an index. When you set the order for the table to a filtered index, you see only the records that are included in the index. This means that in certain situations an index can provide very fast filtering of a table.
For example, imagine a table of customers that includes the CreditLimit field described earlier. In this application its necessary to see these customers in three groups: those with a credit limit between $0 and $1,000; those with a credit limit between $1,001 and $100,000; and those with a credit limit over $100,000. You could open the customer table and then set a filter on it, but this would be relatively slow even if the filter was optimized. The following code shows how you could create three indexes to handle this:
USE Customer
INDEX ON Name FOR CreditLimit <= 1000 TAG CustLow
INDEX ON Name FOR CreditLimit > 1000 AND CreditLimit <=100000 TAG CustMed
INDEX ON Name FOR CreditLimit > 100000 TAG CustHigh
Now, if you want to see the low-limit customers, simply SET ORDER TO CustLow on the Customer table; to see the high-limit customers, youd SET ORDER TO CustHigh. The filtering with these indexes is nearly instantaneous and the navigation between records would be virtually no slower than without the filter.
You also can use this filtering index technique for dynamic filters. Under certain circumstances it may be faster to build a temporary index than to deal with the performance hit caused by a SET FILTER.
Optimizing performance
Visual FoxPros optimization technology is named 'Rushmore.' This technology uses indexes to increase the speed of resolving filters and queries. Rushmore can optimize the expressions associated with the FOR clause and the WHERE clause.
Rushmore does its optimization by using existing (and open) indexes to resolve the conditions in a FOR clause or a WHERE clause, rather than looking at the data in the DBF file itself. Because indexes are smaller than the DBF, more of the index can be held in memory, which speeds up processing. Furthermore, indexes are binary tree structures, so searching for a specific value in an index is much faster than reading each record from a table and checking it for the expression.
An optimizable expression uses one of these four formats:
<Index Key Expression> <Comparison Operator> <Exp>
<Exp> <Comparison Operator> <Index Key Expression>
BETWEEN(<Index Key Expression>, <Exp1>, <Exp2>)
INLIST(<Index Key Expression>, <Exp1>, <Exp2>, )
Here, the <Index Key Expression> must be an exact match with the key expression for an open index. <Exp> and <Exp#> are any expression. The optimizable Comparison operators are <, >, <=, >=, <>, !=, #, =, and ==.
This code shows some examples of optimizable and non-optimizable expressions:
USE Customer
INDEX ON UPPER(LastName) + UPPER(FirstName) TAG Name
* Not optimizable because the index expression is not exactly matched
LOCATE FOR LastName + FirstName = 'SMITH'
* Optimizable
LOCATE FOR UPPER(LastName) + UPPER(FirstName) = 'SMITH'
* Not optimizable
LOCATE FOR UPPER(LastName) = 'SMITH'
Rushmore will not use an index that has a FOR condition (Ex. INDEX ON Name FOR Balance > 1000 TAG BigName will not be used). Also, Rushmore doesnt use indexes created with the UNIQUE option (Ex. INDEX ON State TAG States UNIQUE will not be used). An index with a NOT in its expression will not be used by Rushmore (Ex. INDEX ON NOT Active TAG Inactive will not be used).
Because of Rushmores inability to use filtered indexes, its advisable to create an unfiltered version of any filtered index you may have. Due to the use of indexes for optimization, its advisable to identify the fields that are very likely to be used in selecting records and create indexes on those fields even though you may not use those indexes for ordering records.
Its important to carefully phrase your FOR and WHERE clauses to take advantage of Rushmore so you can get the highest possible performance from your applications.
The DELETED() index
One of the most commonly overlooked selection criterion that is a candidate for having an index is the deleted status of a record. We tend to forget that the command SET DELETED ON is really equivalent to SET FILTER TO DELETED() = .F. When SET DELETED is ON, its irrelevant whether or not there are any deleted records; the filter must be checked by Visual FoxPro.
Because the DELETED ON setting causes the deleted status of records to be checked, having an index on DELETED() will make this test optimizable. We advise that every table have an index on the expression DELETED() in order to fully optimize your queries.
Managing relations
Indexes are required for relations between tables. The syntax of the SET RELATION command demands that the target of the relation be indexed on the expression of the relation. This code shows the syntax of the SET RELATION command:
SET RELATION TO <Expression> INTO <Alias>
Notice that in the syntax of this command there is no statement of what should be matched in the target alias. The target alias must have an index order set that has a key expression matching the expression of the relation. In fact, the SET RELATION command sets up an automatic SEEK to be executed in the target whenever the record pointer is moved in the source alias. The only time an index order is not required in the target is when the expression is numeric. When the expression is numeric and no index order is set in the target, the record number will be used to resolve the relationship.
We have visual mechanisms for setting up relationships, but theyre just faces put on top of the SET RELATION command.
Persistent relations
In a database, youre able to establish relations between tables. These relations are called persistent relations because they persist beyond a given work session. Their general purposes are to set the default for your visual data environment setups and to be used by your referential integrity enforcement code.
Persistent relations are established by dragging a primary or candidate index of one table to an index of another table. These relations are based on an indexs relation to another index. If the target index is regular, the relation will be one-to-many; if the target index is primary or candidate, the relation will be one-to-one.
You can create a persistent relation that has the same table as source and target; this is called a self-relation. When would you ever want to do this? One example would be an employee table where one of the fields was Manager and it contained the EmployeeID of the persons boss. In this case you might want a persistent relation from the Manager index to the EmployeeID index on the Employee table.
Because persistent relations are based entirely on indexes, the existence of the proper indexes to build these relations is critical.
Temporal relations
You create temporal relations in your form and report data environments, as well as in your program code. These relations are called volatile because any trace of their existence disappears when the work session ends. There is no persistence to these relations.
Temporal relations are created based on an expression in the source table being related to an index in the target table. Visually (in a form or report data environment) you can base these relations only on single fields, but by using the SET RELATION command you can use complex expressions for them.
There is no requirement that the temporal relations must match any persistent relations in the database. The persistent relations will be set up as defaults when tables are dragged and dropped to a data environment, but you can delete those relations from the data environment and set other relations.
Indexing views
Yes, you can create indexes on views. You cant store them in any permanent form, so you have to create them every time you open the view. Indexing views can be very beneficial in providing varied sort orders for the data in the view, and in speeding the location of specific records by allowing you to use SEEK.
To create an index on a view, you simply need to select the view and issue an INDEX ON command. If these indexes are created as a structural CDX file, Visual FoxPro will erase them when the view is closed. If theyre created in separate IDX files, then you must erase them yourself.
The only caveat to indexing views is that table buffering will preclude the creation of an index. This is easily handled, though, as follows:
LOCAL lnBuffering
lnBuffering = CursorGetProp( 'BUFFERING', 'TheView' )
CursorSetProp( 'BUFFERING', 5, 'TheView' )
INDEX ON Name TAG Name
CursorSetProp( 'BUFFERING', lnBuffering, 'TheView' )
You can even improve the speed of indexing views by taking advantage of the NODATA option. If you USE a view with the NODATA option, then there will be no records in the view while you create the indexes. In a visual data environment, you can set the NoDataOnLoad property for the cursor to .T. and get the same effect. Look for more information on using indexes with views in Chapter 5.
Politica de confidentialitate | Termeni si conditii de utilizare |
Vizualizari: 1307
Importanta:
Termeni si conditii de utilizare | Contact
© SCRIGROUP 2024 . All rights reserved