Streetview Scraper v2: Virtually free, arbitrary size Street View images

2023-12-26

Google Maps

Hacks

Final Result

Disclaimer: This is most definitely against the Maps Platform's Terms of Service and should therefore be used at your own risk of ban or retribution. This article also doesn't address the morality of skirting the ToS in this manner. I have used this to gather a dataset for academic research purposes, which I personally think is fair game.

Demo: Example showing the client and server running, with the show-browser option used on the client.

Image 1: Comparison of the scraper v2 vs. the Streetview Static API

Features:

Basically free scraping of Street View images (0.014 USD per worker initialization).
Launch multiple workers in parallel.
Get multiple angles and different time periods for each location.
Arbitrarily sized images without watermarks.

Find the code here: https://github.com/lhovon/streetview-scraper-v2

Writeup

While writing the last post on scraping street view images, I realized that one could probably take the post's main idea further and scrape as many images as they wanted for essentially free.

Of course, I had to try it out and I can confirm that it is indeed possible with only a few changes to the scraper! I took the opportunity to cleanup some of the previous code as well.

First, instead of keeping a bunch of JS in the Selenium client, we can just define them on the web page and simply invoke them from Selenium. JS_CHANGE_LOCATION calls the new function that will update the streetview panorama for a new location, free of charge.

JS_MOVE_RIGHT = "moveRight();"
JS_MOVE_LEFT = "moveLeft();"
JS_ADJUST_HEADING = "adjustHeading(%(deg)s, %(pitch)s);"
JS_RESET_CAMERA = "resetCamera(%(pitch)s);"
JS_CHANGE_ZOOM = "window.sv.setZoom(%(zoom)s);"
JS_RESET_INITIAL_POSITION = "window.sv.setPano(document.getElementById('initial-pano').innerText);"
# Change the date of streetview panos
JS_CHANGE_DATE = """
    window.sv.setPano('%(pano)s');
    document.getElementById('initial-pano').innerText = '%(pano)s';
    document.getElementById('current-date').innerText = '%(date)s';
"""
# NEW: Fetch the initial panorama for a new location and update the street view
JS_CHANGE_LOCATION = """
    changeMapPosition(%(lat)s, %(lng)s);
    document.getElementById('case-id').innerText = "%(id)s";
"""

I'll let you browse the code for the moving and heading adjustement functions, let's focus on the important stuff.

The code to chage location is very similar to the street view initialization code. We send our target location's coordinates to StreetViewService and receive the panorama ID at the location, along with the IDs of adjacent panoramas and those of other time periods. To inialize the street view, we then instantiate a StreetViewPanorama with the location's panorama ID - which is what incurs a charge.

As you may have guessed, the simple hack is to instantiate a single StreetViewPanorama that is reused for every location. We get the new location's panoramas through StreetViewService and update the street view with StreetViewPanorama.setPano() all free of charge.

async function changeMapPosition(lat, lng) {
  const { StreetViewService } = await google.maps.importLibrary("streetView");
  const svService = new StreetViewService();

  let coordinates = { 
    lat: lat,
    lng: lng,
  };
  window.coordinates = coordinates;

  let panoRequest = {
    location: coordinates,
    preference: google.maps.StreetViewPreference.NEAREST,
    radius: 10,
    source: google.maps.StreetViewSource.OUTDOOR
  };
  await changeLocations(svService, panoRequest);
}


async function changeLocations(svService, panoRequest) {
  const { spherical } = await google.maps.importLibrary("geometry");
  const { StreetViewStatus } = await google.maps.importLibrary("streetView");
  const coordinates = panoRequest.location;

  // Send a request to the panorama service
  svService.getPanorama(panoRequest, (panoData, status) => {
    if (status === StreetViewStatus.OK) 
    {
      console.debug(`Status ${status}: panorama found.`);
      const panoId = panoData.location.pano;
      const panoDate = getPanoDate(panoData.imageDate);
      const otherPanos = getOtherPanosWithDates(panoData.time);
      const heading = spherical.computeHeading(panoData.location.latLng, coordinates);

      // Adjust the zoom level based on distance to the location
      const dist = distBetween(coordinates, panoData.location.latLng);
      const zoom = getZoomLevel(dist);

      // Update the streetview
      window.sv.setPano(panoId);
      window.sv.setPov({heading: heading, pitch: 0});
      window.sv.setZoom(zoom);
      window.sv_marker.setPosition(coordinates);

      // Store these in the document for the client to access
      // The inital pano is used to easily reset the camera to the initial postion. 
      // This is location dependent so is updated. 
      document.getElementById('initial-pano').innerText = panoId;
      document.getElementById('current-date').innerText = panoDate
      document.getElementById('other-panos').innerText = JSON.stringify(otherPanos);
    }
    else {
      const radius = panoRequest.radius
      if (radius >= 100) {
        console.log(`Status ${status}: Could not find panorama within ${radius}m! Giving up.`);
        alert('ERROR');
      }
      // Retry with increased search radius
      else {
        panoRequest.radius += 25;
        console.log(`Status ${status}: could not find panorama within ${radius}m, trying ${panoRequest.radius}m.`);
        return changeLocations(svService, panoRequest);
      }
    }
  });
}

On the scraping client side, the screenshot function now takes a list of coordinates to scrape and takes care of changing the location every time. At each location, it takes multiple screenshots, moving around using the links and optionally switching to other time periods.

In my default config, the camera turns 60 degrees left and right at the first position and attempts to find a winter date (when obstructions like trees are less impactful to see buildings). The client attempts to snap 7 screenshots per time period at a location. You could also play with the camera pitch (up-down tilt) if you're looking at potentially tall structures.

Here's the updated screenshot function

class StreetviewScreenshotClient():
    ...
    def screenshot(self, cases, worker_id=0, additional_pano_selector=None):
        """
        additional_pano_selector is a function taking list of all available panos and a set 
        of the selected panos as arguments and returning additional panoramas to scrape.
        """
        if additional_pano_selector is None:
            # Default to taking 2 other available panos
            additional_pano_selector = lambda panos, _: panos[:2]

        total_cases = len(cases)
        needs_initialization = True

        # Go over all locations - updating the street view each time
        for i, (id, lat, lng) in enumerate(cases):
            t0 = time.time()
            # Initialize the streetview for the first location
            if needs_initialization:
                # Initialize the streetview container at the initial coordinates
                try:
                    self.driver.get(f"http://127.0.0.1:5000/?id={id}&lat={lat}&lng={lng}")
                    # This is a proxy to know when the streetview is visible
                    self.wait.until(EC.element_to_be_clickable((By.CLASS_NAME, 'gm-iv-address-link')))
                    time.sleep(.5)
                    needs_initialization = False

                # We throw an error alert if we can't load the streetview
                # AttributeError is also thrown sometimes when selenium can't find
                # an element matching '.gm-iv-address-link'
                except (UnexpectedAlertPresentException, AttributeError):
                    print(f'Worker {worker_id} - {id} ERROR ({i} / {total_cases})')
                    continue
            else:
                # Random sleep between location changes
                time.sleep(randrange(1, 2))
                self.change_location(id, lat, lng)

            try:
                # Get some important info from the webpage
                current_pano = self.driver.find_element(By.ID, 'initial-pano').text
                current_date = self.driver.find_element(By.ID, 'current-date').text
                additional_panos = []
                panos_picked = set([current_pano])

                if other_panos_text := self.driver.find_element(By.ID, 'other-panos').text:
                    other_panos = json.loads(other_panos_text)
                    additional_panos = additional_pano_selector(other_panos, panos_picked)

                all_dates = [{'pano': current_pano, 'date': current_date}] + additional_panos

                # Take all screenshots at this location
                for j, to_parse in enumerate(all_dates):
                    pano, date = to_parse.values()
                    if j > 0:
                        self.set_date(pano, date)
                    self.take_screenshot()
                    # Turn the camera right and left a bit
                    self.adjust_heading(60)
                    self.take_screenshot()
                    self.adjust_heading(-120)
                    self.take_screenshot()
                    # Readjust the camera orientation towards the location coordinates
                    self.reset_camera_to_coordinates()

                    self.move('right', num_times=1)
                    self.reset_camera_to_coordinates()
                    self.take_screenshot()

                    self.move('right', num_times=1)
                    self.reset_camera_to_coordinates()
                    self.take_screenshot()
                    # Move back to the initial panorama position
                    self.reset_intial_position()

                    self.move('left', num_times=1)
                    self.reset_camera_to_coordinates()
                    self.take_screenshot()

                    self.move('left', num_times=1)
                    self.reset_camera_to_coordinates()
                    self.take_screenshot()

                print(f'Worker {worker_id} scraped {id} in {round(time.time() - t0,2)}s ({i} / {total_cases})')
            except:
            ...

Results

I've used this to scrape close to 80,000 images from 7,000 buildings of interest. Using the static API, this would have cost around 560 USD. Using version 1 of this scraper, it would have cost around 100 USD - which was acceptable.

Using v2 of the scraper, it cost me around 2 dollars (including all of development and testing). I scraped the images using 4 parallel workers in around 14 hours on my laptop.

Image 2: Resulting bill - the highlighted costs are from running version 1 of the scraper