Scrigroup - Documente si articole

     

HomeDocumenteUploadResurseAlte limbi doc
AccessAdobe photoshopAlgoritmiAutocadBaze de dateCC sharp
CalculatoareCorel drawDot netExcelFox proFrontpageHardware
HtmlInternetJavaLinuxMatlabMs dosPascal
PhpPower pointRetele calculatoareSqlTutorialsWebdesignWindows
WordXml

AspAutocadCDot netExcelFox proHtmlJava
LinuxMathcadPhotoshopPhpSqlVisual studioWindowsXml

StreamTokenizer: StringTokenizer

java



+ Font mai mare | - Font mai mic



StreamTokenizer

Although StreamTokenizer is not derived from InputStream or OutputStream, it works only with InputStream objects, so it rightfully belongs in the IO portion of the library.

The StreamTokenizer class is used to break any InputStream into a sequence of "tokens," which are bits of text delimited by whatever you choose. For example, your tokens could be words, and then they would be delimited by white space and punctuation.



Consider a program to count the occurrence of words in a text file:

//: SortedWordCount.java

// Counts words in a file, outputs

// results in sorted form.

import java.io.*;

import java.util.*;

import c08.*; // Contains StrSortVector

class Counter

void increment()

}

public class SortedWordCount catch(FileNotFoundException e)

}

void cleanup() catch(IOException e)

}

void countWords()

if(counts.containsKey(s))

((Counter)counts.get(s)).increment();

else

counts.put(s, new Counter());

}

} catch(IOException e)

}

Enumeration values()

Enumeration keys()

Counter getCounter(String s)

Enumeration sortedKeys()

public static void main(String[] args)

wc.cleanup();

} catch(Exception e)

}

} ///:~

It makes sense to present these in a sorted form, but since Java 1.0 and Java 1.1 don't have any sorting methods, that will have to be mixed in. This is easy enough to do with a StrSortVector. (This was created in Chapter 8, and is part of the package created in that chapter. Remember that the starting directory for all the subdirectories in this book must be in your class path for the program to compile successfully.)

To open the file, a FileInputStream is used, and to turn the file into words a StreamTokenizer is created from the FileInputStream. In StreamTokenizer, there is a default list of separators, and you can add more with a set of methods. Here, ordinaryChar( ) is used to say "This character has no significance that I'm interested in," so the parser doesn't include it as part of any of the words that it creates. For example, saying st.ordinaryChar('.') means that periods will not be included as parts of the words that are parsed. You can find more information in the online documentation that comes with Java.

In countWords( ), the tokens are pulled one at a time from the stream, and the ttype information is used to determine what to do with each token, since a token can be an end-of-line, a number, a string, or a single character.

Once a token is found, the Hashtable counts is queried to see if it already contains the token as a key. If it does, the corresponding Counter object is incremented to indicate that another instance of this word has been found. If not, a new Counter is created - since the Counter constructor initializes its value to one, this also acts to count the word.

SortedWordCount is not a type of Hashtable, so it wasn't inherited. It performs a specific type of functionality, so even though the keys( ) and values( ) methods must be re-exposed, that still doesn't mean that inheritance should be used since a number of Hashtable methods are inappropriate here. In addition, other methods like getCounter( ), which get the Counter for a particular String, and sortedKeys( ), which produces an Enumeration, finish the change in the shape of SortedWordCount's interface.

In main( ) you can see the use of a SortedWordCount to open and count the words in a file - it just takes two lines of code. Then an enumeration to a sorted list of keys (words) is extracted, and this is used to pull out each key and associated Count. Note that the call to cleanup( ) is necessary to ensure that the file is closed.

A second example using StreamTokenizer can be found in Chapter 17.

StringTokenizer

Although it isn't part of the IO library, the StringTokenizer has sufficiently similar functionality to StreamTokenizer that it will be described here.

The StringTokenizer returns the tokens within a string one at a time. These tokens are consecutive characters delimited by tabs, spaces, and newlines. Thus, the tokens of the string "Where is my cat?" are "Where", "is", "my", and "cat?" Like the StreamTokenizer, you can tell the StringTokenizer to break up the input in any way that you want, but with StringTokenizer you do this by passing a second argument to the constructor, which is a String of the delimiters you wish to use. In general, if you need more sophistication, use a StreamTokenizer.

You ask a StringTokenizer object for the next token in the string using the nextToken( ) method, which either returns the token or an empty string to indicate that no tokens remain.

As an example, the following program performs a limited analysis of a sentence, looking for key phrase sequences to indicate whether happiness or sadness is implied.

//: AnalyzeSentence.java

// Look for particular sequences

// within sentences.

import java.util.*;

public class AnalyzeSentence

static StringTokenizer st;

static void analyze(String s)

if (tk3.equals('not'))

}

}

}

if(token.equals('Are'))

}

if(sad) prt('Sad detected');

}

static String next()

else

return '';

}

static void prt(String s)

} ///:~

For each string being analyzed, a while loop is entered and tokens are pulled off the string. Notice the first if statement, which says to continue (go back to the beginning of the loop and start again) if the token is neither an "I" nor an "Are." This means that it will get tokens until an "I" or an "Are" is found. You might think to use the == instead of the equals( ) method, but that won't work correctly, since == compares handle values while equals( ) compares contents.

The logic of the rest of the analyze( ) method is that the pattern that's being searched for is "I am sad," "I am not happy," or "Are you sad?" Without the break statement, the code for this would be even messier than it is. You should be aware that a typical parser (this is a primitive example of one) normally has a table of these tokens and a piece of code that moves through the states in the table as new tokens are read.

You should think of the StringTokenizer only as shorthand for a simple and specific kind of StreamTokenizer. However, if you have a String that you want to tokenize and StringTokenizer is too limited, all you have to do is turn it into a stream with StringBufferInputStream and then use that to create a much more powerful StreamTokenizer.



Politica de confidentialitate | Termeni si conditii de utilizare



DISTRIBUIE DOCUMENTUL

Comentarii


Vizualizari: 792
Importanta: rank

Comenteaza documentul:

Te rugam sa te autentifici sau sa iti faci cont pentru a putea comenta

Creaza cont nou

Termeni si conditii de utilizare | Contact
© SCRIGROUP 2025 . All rights reserved