Python 2.6 Text Processing Beginner's Guide (2010)
Python%202.6%20Text%20Processing%20-%20Beginner's%20Guide%20(2010)
User Manual: Pdf
Open the PDF directly: View PDF
Page Count: 380 [warning: Documents this large are best viewed by clicking the View PDF Link!]
- Cover
- Copyright
- Credits
- About the Author
- About the Reviewer
- Table of Contents
- Preface
- Chapter 1:
Getting Started
- Categorizing types of text data
- Ensuring you have Python installed
- Implementing a simple cipher
- Time for action – implementing a ROT13 encoder
- Time for action – processing as a filter
- Time for action – skipping over markup tags
- Supporting third-party modules
- Time for action – installing SetupTools
- Running a virtual environment
- Time for action – configuring a virtual environment
- Where to get help?
- Summary
- Chapter 2:
Working with the IO System
- Parsing web server logs
- Time for action – generating transfer statistics
- Using objects interchangeably
- Time for action – introducing a new log format
- Accessing files directly
- Time for action – accessing files directly
- Time for action – handling compressed files
- Accessing multiple files
- Time for action – spell-checking HTML content
- Accessing remote files
- Time for action – spell-checking live HTML pages
- Time for action – handling urllib 2 errors
- Handling string IO instances
- Understanding IO in Python 3
- Summary
- Chapter 3:
Python String Services
- Understanding the basics of string object
- Time for action – employee management
- String formatting
- Time for action – customizing log processor output
- Time for action – adding status code data
- Creating templates
- Time for action – displaying warnings on malformed lines
- Calling string object methods
- Time for action – simple manipulation with string methods
- Summary
- Chapter 4:
Text Processing Using the Standard Library
- Reading CSV data
- Time for action – processing Excel formats
- Time for action – CSV and formulas
- Time for action – processing custom CSV formats
- Writing CSV data
- Time for action – creating a spreadsheet of UNIX users
- Modifying application configuration files
- Time for action – adding basic configuration read support
- Time for action – relying on configuration value interpolation
- Time for action – configuration defaults
- Writing configuration data
- Time for action – generating a configuration file
- Reconfiguring our source
- Time for action – creating an egg-based package
- Working with JSON
- Time for action – writing JSON data
- Summary
- Chapter 5: Regular Expressions
- Chapter 6:
Structured Markup
- XML data
- SAX processing
- Time for action – event-driven processing
- Time for action – driving incremental processing
- Time for action – creating a dungeon adventure game
- The Document Object Model
- Time for action – updating our game to use DOM processing
- XPath
- Time for action – using XPath in our adventure
- Reading HTML
- Time for action – displaying links in an HTML page
- Summary
- Chapter 7:
Creating Templates
- Time for action – installing Mako
- Basic Mako usage
- Time for action – loading a simple Mako template
- Time for action – reformatting the date with Python code
- Time for action – defining Mako def tags
- Time for action – converting mail message to use namespaces
- Inheriting from base templates
- Time for action – updating base template
- Time for action – adding another inheritance layer
- Customizing
- Time for action – creating custom Mako tags
- Overviewing alternative approaches
- Summary
- Chapter 8:
Understanding Encodings and i18n
- Understanding basic character encodings
- Unicode
- Encodings in Python
- Time for action – manually decoding
- Time for action – copying Unicode data
- Time for action – fixing our copy application
- The codecs module
- Time for action – changing encodings
- Adopting good practices
- Internationalization and Localization
- Time for action – preparing for multiple languages
- Time for action – providing translations
- Summary
- Chapter 9:
Advanced Output Formats
- Dealing with PDF files using PLATYPUS
- Time for action – installing ReportLab
- Time for action – writing PDF with basic layout and style
- Writing native Excel data
- Time for action – installing xlwt
- Time for action – generating XLS data
- Working with OpenDocument files
- Time for action – installing ODFPy
- Time for action – generating ODT data
- Summary
- Chapter 10: Advanced Parsing and Grammars
- Chapter 11:
Searching and Indexing
- Understanding search complexity
- Time for action – implementing a linear search
- Text indexing
- Time for action – installing Nucular
- Time for action – full text indexing
- Time for action – measuring index benefit
- Time for action – field-qualified indexes
- Time for action – performing advanced Nucular queries
- Indexing and searching other data
- Time for action – indexing Open Office documents
- Other index systems
- Summary
- Appendix A: Looking for Additional Resources
- Appendix B:
Pop Quiz Answers
- Chapter 1: Getting Started
- Chapter 2: Working with the IO System
- Chapter 3: Python String Services
- Chapter 4: Text Processing Using the Standard Library
- Chapter 5: Regular Expressions
- Chapter 6: Structured Markup
- Chapter 7: Creating Templates
- Chapter 8: Understanding Encoding and i18n
- Chapter 9: Advanced Output Formats
- Chapter 11: Searching and Indexing
- Index