-
-
Notifications
You must be signed in to change notification settings - Fork 902
Home
Eli Duke edited this page Feb 20, 2024
·
22 revisions
Nokogiri is a simple HTML / XML parser with much of its interface borrowed from Hpricot. It uses libxml2 to parse and search, so it is very fast.
Installation is very easy. Just use the following command:
gem install nokogiri
Parsing HTML is easy, and you can take advantage of CSS selectors or XPath queries to find things in your document:
require 'open-uri'
require 'nokogiri'
# Perform a google search
doc = Nokogiri::HTML(URI.open('http://google.com/search?q=tenderlove'))
# Print out each link using a CSS selector
doc.css('h3.r > a.l').each do |link|
puts link.content
end
Here is an example parsing some HTML and searching it using a combination of CSS selectors and XPath selectors:
require 'nokogiri'
doc = Nokogiri::HTML.parse(<<-eohtml)
<html>
<head>
<title>Hello World</title>
</head>
<body>
<h1>This is an awesome document</h1>
<p>
I am a paragraph
<a href="http://google.ca">I am a link</a>
</p>
</body>
</html>
eohtml
####
# Search for nodes by css
doc.css('p > a').each do |a_tag|
puts a_tag.content
end
####
# Search for nodes by xpath
doc.xpath('//p/a').each do |a_tag|
puts a_tag.content
end
####
# Or mix and match.
doc.search('//p/a', 'p > a').each do |a_tag|
puts a_tag.content
end
###
# Find attributes and their values
doc.search('a').first['href']
- New, incomplete guide (please contribute): “Nokogiri for jQuery Users” series