RSS

The easiest way to scrape details from a myspace profile page with php (you won’t believe how simple it is)

This entry was posted on Mar 17 2007

It’s amazing how just a little optimization on the part of myspace makes crawling their site so much easier. We’re going to scrape the user detail (name, age, sex, etc..) from a profile, using the header info like so:

Set your myspace url:

$your_profile_url = ‘http://www.myspace.com/waxjelly’;

Now grab the file using the “file()” function. (We want an array, so we can crawl it and use “trim()” to clean it up)

$file = file($your_profile_url);

Now we setup a string and loop through the profile array to clean it up.

$profile = ”;
for ($i=0; $i<count($file); $i++) {
$profile .= trim($file[$i]);
}

Now we use simple explode functions to do the rest of the work. What we’re looking for is the “<meta” tag at the beginning of the file that will grab the basic details. (This got thrown in place when myspace partnered with google for search optimization. Thanks, guys.).

$det_arr = explode(’<meta name=”description” content=”myspace profile – ‘, strtolower($profile));
$det_arr = explode(’” />’, $det_arr[1]);

We’ve got the whole string that we need, but now we need to separate it into an array we can manage. This is the prettiest part of this script. Myspace prints the element even if it’s blank, so if you leave your city blank, we’ll get a nice little “…Male, , Texas,…” string. Note the double commas. That means we can explode on the comma, and still get a consistently indexed array. (Index 3 will always be city, even if it’s blank. And index 4 will always be state, even if city is blank. Make sense?)

$details = explode(’,', $det_arr[0]);

Now that we have the array that we want, we simply assign them to a more usable system.

$det['name'] = $details[0];
$det['age'] = $details[1];
$det['sex'] = $details[2];
$det['city'] = $details[3];
$det['state'] = $details[4];
$det['country'] = $details[5];
$det['phrase'] = $details[6];

… and print the results.

print_r($det);

That’s it! You can get a working version of the script here. Enjoy!

Your Friend and Mine,
Meshach

[digg=http://digg.com/programming/The_easiest_way_to_scrape_details_from_a_myspace_profile_page_with_php]


12 Responses to “The easiest way to scrape details from a myspace profile page with php (you won’t believe how simple it is)”

  1. Hi,
    nice tutorial, thanks ;-)
    I just translate it in german, you can find it here:
    http://www.php-developer-blog.de/50226711/myspacecomprofil_einfach_mit_php_auslesen.php

    Regards from Stuttgart
    Conny


  2. Rock on Conny, rock-on!


  3. How do i run the script?


  4. I’m not really sure how I’m going to use this one. I don’t have much reason for scraping myspace profiles that i could think of.


  5. Am I missing something? I made the page, then downloaded the working version to make sure, but I don’t get any info… just “Array ( [name] => [age] => [sex] => [city] => [state] => [country] => [phrase] => )” prints out in the browser and none of my info… any help would be appreciated. I’m working on one of those fancy flash layouts for my myspace page and this would put it over the top… Will gladly share when / if it gets done…


  6. Don’t you hate wheny ou download something and can’t find it and have to download it again.


  7. “Am I missing something? I made the page, then downloaded the working version to make sure, but I don’t get any info”

    By looking at myspace profile it seems info are now stored in [title], non in [meta], so the script should be changed accordingly.


  8. I don’t think this version works any more. Myspace ha been changing alot of stuff on their site and very often.


  9. I want to use this…but I get the array printing and not the actual data…any help appreciated

    http://www.jenomedia.com/crawlmyspace/crawlmyspace.php


  10. I’m trying to replace a block of text.

    .contactTable .whitetext12{VISIBILITY:HIDDEN;}

    Your Text

    Anyone know what the problem is?


  11. I’m trying to replace a block of text.

    .contactTable .whitetext12{VISIBILITY:HIDDEN;}

    DIV

    Your Text

    Anyone know what the problem is?


  1. 1 Trackback(s)

  2. developercast.com » Blog Archive » WaxJelly Blog: The easiest way to scrape details from a MySpace profile page with PHP

Post a Comment