The easiest way to scrape details from a myspace profile page with php (you won’t believe how simple it is)
It’s amazing how just a little optimization on the part of myspace makes crawling their site so much easier. We’re going to scrape the user detail (name, age, sex, etc..) from a profile, using the header info like so:
Set your myspace url:
$your_profile_url = ‘http://www.myspace.com/waxjelly’;
Now grab the file using the “file()” function. (We want an array, so we can crawl it and use “trim()” to clean it up)
$file = file($your_profile_url);
Now we setup a string and loop through the profile array to clean it up.
$profile = ”;
for ($i=0; $i<count($file); $i++) {
$profile .= trim($file[$i]);
}
Now we use simple explode functions to do the rest of the work. What we’re looking for is the “<meta” tag at the beginning of the file that will grab the basic details. (This got thrown in place when myspace partnered with google for search optimization. Thanks, guys.).
$det_arr = explode(’<meta name=”description” content=”myspace profile - ‘, strtolower($profile));
$det_arr = explode(’” />’, $det_arr[1]);
We’ve got the whole string that we need, but now we need to separate it into an array we can manage. This is the prettiest part of this script. Myspace prints the element even if it’s blank, so if you leave your city blank, we’ll get a nice little “…Male, , Texas,…” string. Note the double commas. That means we can explode on the comma, and still get a consistently indexed array. (Index 3 will always be city, even if it’s blank. And index 4 will always be state, even if city is blank. Make sense?)
$details = explode(’,', $det_arr[0]);
Now that we have the array that we want, we simply assign them to a more usable system.
$det['name'] = $details[0];
$det['age'] = $details[1];
$det['sex'] = $details[2];
$det['city'] = $details[3];
$det['state'] = $details[4];
$det['country'] = $details[5];
$det['phrase'] = $details[6];
… and print the results.
print_r($det);
That’s it! You can get a working version of the script here. Enjoy!
Your Friend and Mine,
Meshach
[digg=http://digg.com/programming/The_easiest_way_to_scrape_details_from_a_myspace_profile_page_with_php]
You may Leave a comment or Subscribe to Comments RSS or Trackback this entry.
12 comments so far
Leave a comment
Please be polite and on topic. Your e-mail will never be published.
Hi,
nice tutorial, thanks ;-)
I just translate it in german, you can find it here:
http://www.php-developer-blog.de/50226711/myspacecomprofil_einfach_mit_php_auslesen.php
Regards from Stuttgart
Conny
Rock on Conny, rock-on!
[...] the WaxJelly blog today comes a handy bit of code for anyone out there looking to scrape details from just about any MySpace page out there (quick [...]
How do i run the script?
I’m not really sure how I’m going to use this one. I don’t have much reason for scraping myspace profiles that i could think of.
Am I missing something? I made the page, then downloaded the working version to make sure, but I don’t get any info… just “Array ( [name] => [age] => [sex] => [city] => [state] => [country] => [phrase] => )” prints out in the browser and none of my info… any help would be appreciated. I’m working on one of those fancy flash layouts for my myspace page and this would put it over the top… Will gladly share when / if it gets done…
Don’t you hate wheny ou download something and can’t find it and have to download it again.
“Am I missing something? I made the page, then downloaded the working version to make sure, but I don’t get any info”
By looking at myspace profile it seems info are now stored in [title], non in [meta], so the script should be changed accordingly.
I don’t think this version works any more. Myspace ha been changing alot of stuff on their site and very often.
I want to use this…but I get the array printing and not the actual data…any help appreciated
http://www.jenomedia.com/crawlmyspace/crawlmyspace.php
I’m trying to replace a block of text.
.contactTable .whitetext12{VISIBILITY:HIDDEN;}
Your Text
Anyone know what the problem is?
I’m trying to replace a block of text.
.contactTable .whitetext12{VISIBILITY:HIDDEN;}
DIV
Your Text
Anyone know what the problem is?