Fetching Android Market Stats with Selenium RC
(2010)
Finally.. I've got a reasonably decent way to pull Android Market stats. For some reason I keep coming back to this topic. This time, the way forward is to use Selenium RC, part of the Selenium browser testing suite.
My example will be in Python, but Selenium has bindings for several languages.
First of all, you gotta download Selenium RC from here: http://seleniumhq.org/download/
Then, extract it someplace you can remember. I've been putting things in ~/opt lately.
Okay, now create a new python script, comma ca:
import sys
sys.path.append('/the/path/to/selenium-python-client-driver-1.0.1')
from selenium import selenium
email = 'YOUR_GOOGLE_LOGIN'
passwd = 'YOUR_PASSWORD'
s = selenium("localhost", 4444, "*firefox", "http://market.android.com")
s.start()
s.open("/publish/Home")
s.type("Email", email)
s.type("Passwd", passwd)
s.click("signIn")
s.wait_for_page_to_load("30000")
n = int(s.get_xpath_count("//div[@class='listingRow']"))
for i in range(3,n):
try:
title = s.get_text("xpath=(//div[@class='listingRow'])[%s]/div[1]/div[1]" % i)
downloaded = s.get_text("xpath=(//div[@class='listingRow'])[%s]/div[2]/div[1]/span[1]" % i)
installed = s.get_text("xpath=(//div[@class='listingRow'])[%s]/div[2]/div[2]/span[1]" % i)
comments = s.get_text("xpath=(//div[@class='listingRow'])[%s]/table" % i)[1:-1]
print title, downloaded, installed, comments
except:
pass
- Be sure to fill in YOUR_GOOGLE_LOGIN with your email (or whatever login) and the matching password.
This script is a bit of a trainwreck.. but it works and I don't feel like screwing with it..
-
Working with xpath in selenium-rc's python binding feels really weird.. doesn't seem to behave quite the way you would expect.
-
Why does the iteration start at 3? I dunno.. there are some empty rows at the beginning I guess..
-
Why is it wrapped in a try-except block? I dunno.. some empty rows at the end?
-
It works on Ubuntu 10.04 / FF 3.6.3. Your mileage may vary. I wouldn't be surprised if those xpath selectors needed more tweaking in some cases.
To run the script, you need to start the Selenium RC server. Go to the place you downloaded it:
cd /path/to/selenium
java -jar selenium-server.jar
Then, you should be able to run this script from a terminal and it will start firefox, log you in to the Android Developer Console, wait a few seconds til the Ajax all loads, then use xpath to scrape each row of data from the table and print it to the terminal.
From there it should be pretty simple to export the results into a CSV file or make pretty charts or whatever it is you wanna do.
It does pop up a window on the screen, which is kinda annoying. Cooler to run firefox headless, maybe some other time..