Cheshire3 Object Model - PreParser¶
API¶
- class cheshire3.baseObjects.PreParser(session, config, parent=None)[source]¶
A PreParser takes a Document and returns a modified Document.
For example, the input document might consist of SGML data. The output would be a Document containing XML data.
This functionality allows for Workflow chains to be strung together in many ways, and perhaps in ways which the original implemention had not foreseen.
Implementations¶
The following implementations are included in the distribution by default:
- class cheshire3.preParser.NormalizerPreParser(session, config, parent)[source]¶
Calls a named Normalizer to do the conversion.
- class cheshire3.preParser.UnicodeDecodePreParser(session, config, parent)[source]¶
PreParser to turn non-unicode into Unicode Documents.
A UnicodeDecodePreParser should accept a Document with content encoded in a non-unicode character encoding scheme and return a Document with the same content decoded to Python’s Unicode implementation.
- class cheshire3.preParser.FileUtilPreParser(session, config, parent)[source]¶
Call ‘file’ util to find out the current type of file.
- class cheshire3.preParser.MagicRedirectPreParser(session, config, parent)[source]¶
Map to appropriate PreParser based on incoming MIME type.
- class cheshire3.preParser.HtmlSmashPreParser(session, config, parent)[source]¶
Attempts to reduce HTML to its raw text
- class cheshire3.preParser.RegexpSmashPreParser(session, config, parent)[source]¶
Strip, replace or keep only data which matches a given regex.
- class cheshire3.preParser.AmpPreParser(session, config, parent)[source]¶
Escape lone ampersands in otherwise XML text.
- class cheshire3.preParser.MarcToXmlPreParser(session, config, parent=None)[source]¶
Convert MARC into MARCXML
- class cheshire3.preParser.MarcToSgmlPreParser(session, config, parent=None)[source]¶
Convert MARC into Cheshire2’s MarcSgml
- class cheshire3.preParser.TxtToXmlPreParser(session, config, parent=None)[source]¶
Minimally wrap text in <data> XML tags
- class cheshire3.preParser.PicklePreParser(session, config, parent=None)[source]¶
Compress Document content using Python pickle.
- class cheshire3.preParser.UnpicklePreParser(session, config, parent=None)[source]¶
Decompress Document content using Python pickle.
- class cheshire3.preParser.GzipPreParser(session, config, parent)[source]¶
Gzip a not-gzipped document.
- class cheshire3.preParser.GunzipPreParser(session, config, parent=None)[source]¶
Gunzip a gzipped document.
- class cheshire3.preParser.B64EncodePreParser(session, config, parent=None)[source]¶
Encode document in Base64.
- class cheshire3.preParser.B64DecodePreParser(session, config, parent=None)[source]¶
Decode document from Base64.