Saturday, April 11, 2009

HTML DOM Parsing

If you want to extract data from a webpage and not getting the exact data using different patterns and "PHP String Positions" techniques, then here is the suggestion for you.

This is Simple HTML DOM Parser written in PHP5+ by a Developer on Sourceforge.net

Developer Profile:
http://sourceforge.net/users/me578022/

Download Link:
http://sourceforge.net/projects/simplehtmldom/

Documentions:
http://simplehtmldom.sourceforge.net/



How to Use:

First of all download the simple_html_dom.php File from download link.


// Include the downloded file in your own php file.
include('../simple_html_dom.php');

// Create a DOM First:
$html = file_get_html('http://www.google.com/'); // Create DOM from URL or file
$html = str_get_html($str, $lowercase=true); // Create DOM from String of HTML Script


// find all links
foreach($html->find('a') as $element)
echo $element->href;

// find all images
foreach($html->find('img') as $element)
echo $element->src;

// find all images with full tag
foreach($html->find('img') as $element)
echo $element->outertext;

// find all div tags with id=gbar
foreach($html->find('div#gbar') as $element)
echo $element->innertext;

// find all span tags with class=gb1
foreach($html->find('span.gb1') as $element)
echo $element->outertext;

// find all td tags with attribite align=center
foreach($html->find('td[align=center]') as $element)
echo $element->innertext;

// extract text from table
echo $html->find('td[align="center"]', 1)->plaintext;

// extract text from HTML
echo $html->plaintext;



Read documents for more example.

No comments: