com.catcode.odf
Class OpenDocumentTextInputStream

java.lang.Object
  extended byjava.io.InputStream
      extended byjava.io.FilterInputStream
          extended bycom.catcode.odf.OpenDocumentTextInputStream

public class OpenDocumentTextInputStream
extends java.io.FilterInputStream

OpenDocumentTextInputStream reads the content of an OASIS Open Document Format text (word processing) file.

Limitations/restrictions:

You can set two lists of element names (using the OpenDocumentElement class). The capture list is the list of elements whose text you want; the omit list is the list of elements within which text is never output. The default value for the capture list is <text:p> and <text:h. The default value for the omit list is <text:tracked-changes>.


Field Summary
 
Fields inherited from class java.io.FilterInputStream
in
 
Constructor Summary
OpenDocumentTextInputStream(java.io.InputStream in)
          Constructs an OASIS Open Document Text input stream.
OpenDocumentTextInputStream(java.io.InputStream in, java.util.ArrayList capture, java.util.ArrayList omit)
          Constructs an OASIS Open Document Text input stream.
 
Method Summary
protected  void analyzeTag(java.lang.String tag)
          Set flags to accept or reject characters in this tag.
protected  void collectEntity()
          Collect all characters up to and including the ending semicolon of the entity.
protected  void collectTag()
          Collects information between angle brackets into a string buffer.
protected  int collectUTF8(int startByte)
          Create a UTF-8 character from individual bytes.
protected  void createUTF8Output(int value)
          Split a Unicode value into UTF-8 bytes.
 int read()
          Reads the next byte of data from this input stream.
 int read(byte[] b)
          Reads some number of bytes from the input stream and stores them into the buffer array b.
 int read(byte[] b, int off, int len)
          Reads up to len bytes of data from the input stream into an array of bytes.
 long skip(long n)
          Skips specified number of bytes in the current ODT file entry.
 
Methods inherited from class java.io.FilterInputStream
available, close, mark, markSupported, reset
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

OpenDocumentTextInputStream

public OpenDocumentTextInputStream(java.io.InputStream in)
Constructs an OASIS Open Document Text input stream.

Parameters:
in - the actual input stream

OpenDocumentTextInputStream

public OpenDocumentTextInputStream(java.io.InputStream in,
                                   java.util.ArrayList capture,
                                   java.util.ArrayList omit)
Constructs an OASIS Open Document Text input stream. This constructor lets you provide a list of "capture" elements whose content you wish to examine. and "omit" elements whose content will always be omitted. These lists must be sorted into Unicode order, since it will be searched with binarySearch().

If you want an empty list for either one of these, pass in an empty ArrayList. Passing in null will set you up with the default capture or omit list.

Parameters:
in - the actual input stream
capture - an ArrayList of elements whose content will be read by this stream
omit - An ArrayList of element whose content will be ignored by ths stream.
Method Detail

read

public int read()
         throws java.io.IOException
Reads the next byte of data from this input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned. Only bytes within "relevant" elements (as listed in the relevantElement list) are returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.

Returns:
the next byte of data, or -1 if the end of the stream is reached.
Throws:
java.io.IOException - if an I/O error occurs.

read

public int read(byte[] b)
         throws java.io.IOException
Reads some number of bytes from the input stream and stores them into the buffer array b. The number of bytes actually read is returned as an integer.

Throws:
java.io.IOException

read

public int read(byte[] b,
                int off,
                int len)
         throws java.io.IOException
Reads up to len bytes of data from the input stream into an array of bytes. The number of bytes actually read is returned as an integer. See InputStream for details. In fact, this code is copied straight from that file.

Throws:
java.io.IOException

skip

public long skip(long n)
          throws java.io.IOException
Skips specified number of bytes in the current ODT file entry.

Parameters:
n - the number of bytes to skip
Returns:
the actual number of bytes skipped
Throws:
java.io.IOException - if an I/O error has occurred
java.lang.IllegalArgumentException - if n < 0

collectEntity

protected void collectEntity()
                      throws java.io.IOException
Collect all characters up to and including the ending semicolon of the entity. Accepts entities in form &#nnn; &#xnnn; α, but checks to see that alpha entities are only the "big five".

This method will fill the utf8Output[] array, set utf8OutputLength appropriately, and set utf8OutputPosition to zero.

If we hit the end of file, put -1 in the utf8 buffer; the main loop in read() will emit it the next time through.

Throws:
java.io.IOException - if I/O error occurs while reading bytes.

createUTF8Output

protected void createUTF8Output(int value)
Split a Unicode value into UTF-8 bytes. Puts bytes into utf8Output[] and sets the utf8OutputLength appropriately.


collectTag

protected void collectTag()
                   throws java.io.IOException
Collects information between angle brackets into a string buffer.

Reads from file until encountering a > symbol. If a byte has a value greater than 127, then call collectUTF8() to combine it and the following bytes into a Unicode character.

If we hit the end of file, put -1 in the utf8 buffer; the main loop in read() will emit it the next time through.

Throws:
java.io.IOException - if I/O error occurs while reading bytes.


collectUTF8

protected int collectUTF8(int startByte)
                   throws java.io.IOException
Create a UTF-8 character from individual bytes.

Parameters:
startByte - the starting byte of a UTF-8 sequence.
Returns:
a UTF-8 character.
Throws:
java.io.IOException

analyzeTag

protected void analyzeTag(java.lang.String tag)
Set flags to accept or reject characters in this tag.

Parameters:
tag - the tag to be analyzed