Streetview Scraper
For an improved model of this venture, see https://loichovon.com/posts/streetview-scraper-v2.html
Disclaimer: That is most undoubtedly towards the Maps Platform’s Terms of Service and will due to this fact be used at your personal threat of ban or retribution. This text additionally does not address the morality of skirting the ToS on this method. I’ve used this to assemble a dataset for tutorial analysis functions, which I personally suppose is honest sport.
Picture 1: Instance comparability of equal photos taken from the static API and the JS API
Options:
- Low-cost scraping of Road View photos (0.014 USD per location – theoretically lowerable to 0.014 USD for any variety of areas in the event you prolong this).
- Launch a number of parallel staff.
- Mechanically get a number of angles and completely different time durations for a location.
- Arbitrarily sized photos with out watermarks.
Get the code right here: https://github.com/lhovon/streetview-scraper-v1
I needed to accumulate a streetview picture dataset for a venture at work (I work for a analysis venture making an attempt to do massive scale deep energy retrofits in Quebec) to make use of with/practice pc imaginative and prescient fashions to mechanically detect sure constructing options. Keep tuned for an upcoming publish about some cool experiments we did with them!
I wished a number of angles of every constructing, like when transferring round in Road View, in addition to photos from earlier time durations. Doing this with the Static API was cumbersome.
AFAIK, solely the javascript API sends again the IDs of the hyperlinks (the arrow controls to adjoining panoramas) once you name StreetViewService.getPanorama()
and of the opposite time durations (lined within the Time Travel article).
After failing to make use of tasks like Js2Py
, I resorted to utilizing Selenium to load the JS API. For a given location, I may now get the panorama IDs for the hyperlinks and the earlier time durations. For one another time interval, I needed to get their hyperlinks individually. Lastly, for every panorama I needed to calculate the heading (digicam dealing with route) earlier than lastly requesting them by way of the static API. This was fairly costly at 0.007 USD per image. I racked up a $800 invoice scraping photos for ~10,000 buildings. As well as, the pictures had been solely 640×640 pixels.
Word: there may be a method to do that utilizing Google’s backend API straight like
robolyst
is doing here however I didn’t have the time (nor motivation) to do this.
Sooner or later, I spotted I may do every little thing from the JS API if I may take screenshots from a headless browser. It could even be cheaper, since Google solely prices to initialize the Road View container, and never for subsequent panorama adjustments.
Constructing on prime of the last post’s Street View screenshotting functionality, we’ll use Selenium to load the JS API and scrape photos, mechanically transferring round utilizing the hyperlinks and altering time durations. Right here we load a brand new container for every place of curiosity, however you can simply abuse this additional by calling StreetViewService.getPanorama()
for every of your areas, adopted by StreetViewPanorama.setPano()
at all times on the identical container.
Utilizing this little workaround, we are able to scrape arbitrarily-sized photos for a fraction of the fee. The tradeoff is paid in time, as going by way of Selenium is way slower than utilizing the Static API. Moreover, I’ve discovered the necessity to wait between screenshots when transferring round, to offer the container sufficient time to completely replace. Taking 10 screenshots at a single location takes about 30 seconds. Nonetheless, we are able to parallelize going over a number of areas as every Selenium occasion is totally impartial.
The final two posts about Google Maps cowl facets of this publish intimately. Within the Time Travel post we see learn how to initialize the Road View container and alter between the out there time durations for a location, and within the screenshot post we see learn how to use html2canvas
to take clear screenshots. We can’t cowl every little thing right here so if something is complicated, attempt going by way of these posts!
Picture 2: System structure diagram. We use a headless browser like Selenium to have the ability to execute the Javascript wanted to load the Maps JS API and take screenshots. Utilizing Javascript we are able to additionally work together with Road View to vary place, zoom, heading, and many others.
Ranging from this base, we’ll setup a easy flask
internet server with two REST endpoints: GET /
which serves the screenshot view, and POST /add
which is able to save the screenshots to disk. The server generates internet pages dynamically by rendering jinja2
templates and substituting in any variables for his or her values. This permits us to go information from the server to the frontend, and to the Javascript executing within the browser.
The frontend wants just a few issues: a Maps API key and the coordinates of the place we’re desirous about. I’ve additionally added an identifier to simplify naming the screenshots.
# server.py
from flask import Flask, render_template, request
app = Flask(__name__)
MAPS_API_KEY = "YOUR_API_KEY" # Disguise this in a .env file
@app.route("https://loichovon.com/", strategies=['GET'])
def screenshot():
id = request.args.get('id')
lat = request.args.get('lat', 45.531776760335504) # default values if none
lng = request.args.get('lng', -73.55924595184348)
return render_template('index.html', id=id, lat=lat, lng=lng, key=MAPS_API_KEY)
From the template, we go the information to the Javascript with data attributes:
<!-- templates/index.html -->
...
<script src="{{ url_for('static', filename="scripts/screenshot.js") }}"
data-id="{{ id }}">
</script>
<script src="{{ url_for('static', filename="scripts/maps.js") }}"
data-lat="{{ lat }}"
data-lng="{{ lng }}">
</script>
and we retrieve them within the JS by way of doc.currentScript.dataset
:
// static/scripts/maps.js
const mapsData = doc.currentScript.dataset;
let coordinates = {
lat: parseFloat(mapsData.lat), // information attributes are String
lng: parseFloat(mapsData.lng),
};
... // load the embedded Road View container
Passing information from the JS to the server is straightforward utilizing fetch
. When the screenshot button is clicked, we’ll ship the picture to the /add
endpoint, together with just a few different issues.
async perform screenshotStreetview(e) {
e.preventDefault();
const postData = {
id: dataset.id,
pano: window.sv.getPano(),
date: doc.getElementById('current-date').innerText,
img: await screenshot('streetview'),
}
fetch("/add", {
methodology: "POST",
mode: "same-origin",
cache: "no-cache",
credentials: "same-origin",
headers: {
"Content material-Kind": "software/json",
},
physique: JSON.stringify(postData),
}).then(() => alert('OK')); // alert will probably be utilized by Selenium to know if add was profitable
}
We additionally have to go information to Selenium. We’ll merely retailer it within the DOM and scrape it. We’re passing Selenium the listing of obtainable panoramas from all out there time durations on the location, the present panorama and present date.
svService.getPanorama(panoRequest, (panoData, standing) => {
if (standing === StreetViewStatus.OK) {
const panoId = panoData.location.pano;
const panoDate = getPanoDate(panoData.imageDate); // Converts date format
const otherPanos = getOtherPanosWithDates(panoData.time); // Converts date format
const heading = spherical.computeHeading(panoData.location.latLng, coordinates);
// That is charged .014 USD
const sv = new StreetViewPanorama(doc.getElementById('streetview'), {
place: coordinates,
heart: coordinates,
zoom: 0,
pov: {pitch: 0, heading: heading}
});
sv.setPano(panoId);
// Save these in window for simple entry later
window.sv = sv;
window.computeHeading = spherical.computeHeading;
// Retailer these within the doc for the consumer to entry
doc.getElementById('initial-position-pano').innerText = panoId;
doc.getElementById('current-date').innerText = panoDate
doc.getElementById('other-panos').innerText = JSON.stringify(otherPanos);
}});
As we stated, we’ll use Selenium so we are able to execute javascript and cargo the road view. We outline a easy class for the consumer. The window_size
parameter impacts the screenshot measurement (screenshots are barely smaller in width than the window measurement e.g. 1901×1080 for 1920×1080 window measurement). Taking screenshots is as straightforward as clicking the button by way of Selenium.
class StreetviewScreenshotClient():
def __init__(self, window_size="1920,1080"):
chrome_opts = chrome.choices.Choices()
chrome_opts.add_argument(f"window-size={window_size}") # Impacts the image measurement
chrome_opts.add_argument("--log-level=3") # conceal logs
chrome_svc = chrome.service.Service(log_output=os.devnull) # conceal logs
self.driver = chrome.webdriver.WebDriver(service = chrome_svc, choices=chrome_opts)
self.wait = WebDriverWait(self.driver, 10)
def take_screenshot(self):
self.driver.find_element(By.ID, 'btn-screenshot').click on()
self.wait.till(EC.alert_is_present()) # alert thrown after the fetch()
self.driver.switch_to.alert.settle for()
time.sleep(.5)
We will use WebDriver.execute_script()
to work together with the Road View container from Selenium.
For instance, we are able to transfer proper by executing the next script. The div.gmnoprint.SLHIdE-sv-links-control
question selector is probably going particular to v3.53 of the Maps API and should should be modified sooner or later. I am utilizing a heuristic to seek out the proper hyperlink right here, so this doesn’t work 100% of the time however is nice sufficient. We’ve got the same script to maneuver left.
JS_MOVE_RIGHT = """
const hyperlinks = doc.querySelector('div.gmnoprint.SLHIdE-sv-links-control').firstChild.querySelectorAll('[role="button"]');
var index = 0;
if (hyperlinks.size === 2 || hyperlinks.size === 3)
index = 0;
else if (hyperlinks.size === 4)
index = 1;
hyperlinks[index].dispatchEvent(new Occasion('click on', {bubbles: true}));
"""
With a view to maintain dealing with in the direction of the focal point, we have to recompute the headings each time we transfer. We do that simply with the google.maps.geometry.spherical.computeHeading
perform that we conviently made out there by way of window
. We will optionally go a pitch paramater when calling it from the consumer.
JS_ADJUST_HEADING = """
window.sv.setPov({
heading: window.computeHeading(window.sv.getPosition().toJSON(), window.coordinates),
pitch: %s
});
"""
Within the consumer we use these scripts like this:
class StreetviewScreenshotClient():
...
def transfer(self, route, num_times=1):
if route == 'left':
move_script = JS_MOVE_LEFT
elif route == 'proper':
move_script = JS_MOVE_RIGHT
else:
increase Exception('Left or Proper solely')
for _ in vary(num_times):
self.driver.execute_script(move_script)
time.sleep(.3)
def readjust_heading(self, pitch=0):
self.driver.execute_script(JS_ADJUST_HEADING % str(pitch))
time.sleep(.5)
The sleep
calls are crucial for the Road View container to completely replace after interacting with it. These values had been hand-tuned for my laptop computer so may need to be modified, e.g. elevated for bigger window sizes. Leaving them out ends in blurry screenshots.
Picture 3: Instance of blurry screenshots once we do not sleep after transferring round
We have saved the listing of different panoramas out there on the location within the DOM. This listing offers the time interval (Month – Yr) of every panorama. I’ve made it so you may go a perform to pick out different out there time durations to scrape.
For instance, we are able to choose a winter month like this:
def select_one_winter_month(other_dates: listing, panos_picked: set):
additional_panos = []
# We reverse the listing as a result of dates normally given in chronological
# order however we're desirous about newer panoramas
for date in reversed(other_dates):
month = date['date'].break up(' ')[0]
if month in ['Nov', 'Dec', 'Jan', 'Feb', 'Mar', 'Apr']:
# Keep away from duplicate panos
if date['pano'] in panos_picked:
proceed
additional_panos.append(date)
panos_picked.add(date['pano'])
break
return additional_panos
This selector perform is then handed to the consumer’s screenshot
perform. For every time interval, the consumer is configured to take a screenshot on the preliminary positon, transfer proper twice, reset its place and transfer left twice for a complete of 5 screenshots.
class StreetviewScreenshotClient():
...
def screenshot(self, id, lat=None, lng=None, additional_pano_selector=None):
...
additional_panos = []
panos_picked = set([current_pano])
if other_panos_text := driver.find_element(By.ID, 'other-panos').textual content:
other_panos = json.hundreds(other_panos_text)
# Right here we name the perform to get the extra panos to scrape
additional_panos = additional_pano_selector(other_panos, panos_picked)
all_dates = [{'pano': current_pano, 'date': current_date}] + additional_panos
for i, to_parse in enumerate(all_dates):
pano, date = to_parse.values()
if i > 0:
self.set_date(pano, date)
self.take_screenshots(zooms=zooms)
self.transfer('proper', num_times=1)
self.readjust_heading(pitch = pitch_mod)
self.take_screenshots(zooms=zooms)
self.transfer('proper', num_times=1)
self.readjust_heading(pitch = pitch_mod)
self.take_screenshots(zooms=zooms)
self.reset_intial_position()
self.transfer('left', num_times=1)
self.readjust_heading(pitch = pitch_mod)
self.take_screenshots(zooms=zooms)
self.transfer('left', num_times=1)
self.readjust_heading(pitch = pitch_mod)
self.take_screenshots(zooms=zooms)
And that is just about it for core performance. I’ve hardcoded 10 check circumstances within the POC code, however you’ll usually get them from a database. There’s additionally code to launch a number of parrallel staff with a purpose to pace up a bigger scale scraping.
I’ve used this to scrape photos of 2049 low-cost housing buildings in Quebec for a analysis venture, acquiring 23,211 photos in complete. The ensuing invoice was round 50 CAD, or round 37 USD which is round 0.018 USD per location. Utilizing the static API, this may have value round 160 USD.
Picture 4: Cloud console invoice after scraping 23,211 pics of two,049 areas (in Canadian {dollars}). Utilizing the Static API, this may have been round 4 occasions larger.
For every of our buildings of curiosity, we even have a polygon of the lot boundaries. I might like to increase this to mechanically go across the lot and take screenshots overlaying as many sides of the constructing as attainable vs just one facet at the moment.
Picture 5: Future developments embody mechanically transferring to the sides of lots polygon and taking footage.