PHP - XPATH - Scraping Data From A Page

masterjake · Sep 1, 2009

My friend wanted me to create a script to scrape his Xbox gamertag and total XP from a page that it's stored and automatically updated on. I barely even know how xpath works.

Can someone tell me how I would do this.

There's a ton of tags but the actual data I need to retrieve on the page is located in a set like this:

Code:

<div class="user">
                                <h3>
                                    
                                            NEED TO RIP THIS
                                                                                
                                </h3>
                                <div class="userpic">
                                 

                                </div>
                                <div class="userinfo">
                                    <dl>
                                        <dt></dt>
                                        <dd>NEED TO RIP THIS</dd><!-- TotalRank -->
                                    </dl>
                                    <dl>
                                        <dt></dt>
                                        <dd></dd>
                                    </dl>
                                    <dl>
                                        <dt></dt>
                                        <dd></dd>
                                    </dl>                                    
                                    <dl>
                                        <dt></dt>
                                        <dd></dd>
                                    </dl>
                                    <dl>
                                        <dt/>
                                        <dd>
                                            
                                        </dd>
                                    </dl>
                                </div>
                            </div>

I have deleted all data out of the tags and put a "NEED TO RIP THIS" statement in the 2 spots that I need to scrape the data from. Can someone help me?

misson · Sep 1, 2009

Have you read any tutorials on XPATH? What XML library do you want to use to parse the document (DOM, libxml, SimpleXML, XML or XMLReader)?

masterjake · Sep 1, 2009

I've read a bit but they didn't explain very well as to why things were happening, e.g. the tag order and all that.

I want to use DOM.

misson · Sep 1, 2009

You understand filesystem paths, right? XPath is a little like filesystem paths given the 6-million-dollar man treatment, with a teleporter, targeting computer and a high powered sensor array. A filesystem path selects a file or directory, wherease an XML path selects document nodes. Nodes can be elements, attributes, comments, text and namespaces (note there is some overlap with DOM nodes).

Every step in an XML path has an axis, a node test and a predicate ("axis::test[predicate]") and are separated by forward slashes, while filesystem path steps only have a simple node test (no axis or predicate). In an XML path, axes basically say where to go from the previous node (parent, children, descendants or siblings). Filesystem paths offer only the "child::" axis, which you don't need to specify (in either XPath or fs paths). Predicates are filters; if filesystem paths had predicates, they would do things like let you specify file size, owner, modification time or permissions in the path (e.g. "./*.db[size>1M]" to select all files in the current directory ending in ".db" with size greater than 1 MiB).

Nodes are in document order or reverse document order. Which you get depends on whether the axis is a forward (e.g. "descendent::", "following-sibling::") or reverse axis (e.g. "ancestor::", "preceding-sibling::").

For more information, read the tutorials. If you have specific questions about features of XPath (such as your question about node order), read the XPath 1.0 standard or ask them here.

Note that with the DOM extension, you can use DOMDocument::getElementById(), DOMDocument::getElementsByTagName() and DOMNode::$childNodes instead of using XPath, but the resulting PHP code will be more complex.

masterjake · Sep 1, 2009

Thank you. I will get on that.

PHP - XPATH - Scraping Data From A Page

masterjake

New Member

misson

Community Paragon

masterjake

New Member

misson

Community Paragon

masterjake

New Member

Free Web Hosting

Our Community

Legal