Automated Data Collection With R Simon Munzert, Christian Rubba, Peter Meißner, Dominic Nyhuis
Simon%20Munzert%2C%20Christian%20Rubba%2C%20Peter%20Mei%C3%9Fner%2C%20Dominic%20Nyhuis-Automated%20Data%20Collection%20with%20R_
User Manual:
Open the PDF directly: View PDF
Page Count: 477 [warning: Documents this large are best viewed by clicking the View PDF Link!]
- Automated Data Collection with R
- Contents
- Preface
- 1 Introduction
- Part One A Primer on Web and Data Technologies
- 2 HTML
- 2.1 Browser presentation and source code
- 2.2 Syntax rules
- 2.3 Tags and attributes
- 2.3.1 The anchor tag <a>
- 2.3.2 The metadata tag <meta>
- 2.3.3 The external reference tag <link>
- 2.3.4 Emphasizing tags <b>, <i>, <strong>
- 2.3.5 The paragraphs tag <p>
- 2.3.6 Heading tags <h1>, <h2>, <h3>,
- 2.3.7 Listing content with <ul>, <ol>, and <dl>
- 2.3.8 The organizational tags <div> and <span>
- 2.3.9 The <form> tag and its companions
- 2.3.10 The foreign script tag <script>
- 2.3.11 Table tags <table>, <tr>, <td>, and <th>
- 2.4 Parsing
- Summary
- Further reading
- Problems
- 3 XML and JSON
- 4 XPath
- 5 HTTP
- 6 AJAX
- 7 SQL and relational databases
- 8 Regular expressions and essential string functions
- 2 HTML
- Part Two A Practical Toolbox for Web Scraping and Text Mining
- 9 Scraping the Web
- 9.1 Retrieval scenarios
- 9.1.1 Downloading ready-made files
- 9.1.2 Downloading multiple files from an FTP index
- 9.1.3 Manipulating URLs to access multiple pages
- 9.1.4 Convenient functions to gather links, lists, and tables from HTML documents
- 9.1.5 Dealing with HTML forms
- 9.1.6 HTTP authentication
- 9.1.7 Connections via HTTPS
- 9.1.8 Using cookies
- 9.1.9 Scraping data from AJAX-enriched webpages with Selenium/Rwebdriver
- 9.1.10 Retrieving data from APIs
- 9.1.11 Authentication with OAuth
- 9.2 Extraction strategies
- 9.3 Web scraping: Good practice
- 9.4 Valuable sources of inspiration
- Summary
- Further reading
- Problems
- 9.1 Retrieval scenarios
- 10 Statistical text processing
- 11 Managing data projects
- 9 Scraping the Web
- Part Three A Bag of Case Studies
- 12 Collaboration networks in the US Senate
- 13 Parsing information from semistructured documents
- 14 Predicting the 2014 Academy Awards using Twitter
- 15 Mapping the geographic distribution of names
- 16 Gathering data on mobile phones
- 17 Analyzing sentiments of product reviews
- References
- General index
- Package index
- Function index
- EULA