One of the problems I was faced with at work required me to calculate the total value of transactions many different Shopify sites had within every North Carolina county. To do this, I created a Python script which uses the Google Maps API to process CSV files from Shopify containing all order history for a store. The script needed to be able to identify addresses within North Carolina, then run those through the Google Maps API in order to get the county which the address resides in, and add all order values to a running total per county.

First, I check to make sure the state is North Carolina, and then create a string to give the Google Maps API.

for row in reader:
	if row[statekey] == 'NC':
		exists = False
        # address string is City, State such as Pittsboro, NC
        address_string = row[namekey].lower() + ', ' + row[statekey]

Then, I check to make sure I haven't already checked this address.

for county in list(counties.keys()):
	if address_string in counties.get(county):
		exists = True
        county_name = county

Then, if it doesn't exist, this is where the getCounty() function comes in. This function makes the API request to Google and returns the county name.

def getCounty(address_string):
    maps = googlemaps.Client(key='xxxxx')
    location = maps.geocode(address_string)
    target = 'administrative_area_level_2'

    for loc in location[0]['address_components']:
        if target in loc['types']:
            return loc['long_name']
        else:
            pass

By making sure we only make one request per city or town, we save a huge amount on the requests and make the script run much faster. We then check the county that was returned against those we currently have a total running for, and add the town to the county so we do not check it again.

if not exists:
	print('Looking up county for ', address_string)
    county_name = getCounty(address_string)

    if county_name not in list(counties.keys()):
    	counties.update({county_name: [address_string]})
    else:
    	entries = counties.get(county_name)
        entries.append(address_string)
        counties.update({county_name: entries})

If the county does not already have a running total, we create a new county and set the price to the order total or 0 if there is none.

if county_name not in places:
	print('{} doesn\'t exist, adding.'.format(county_name))
	try:
		priceData = []
		for key in datakeys:
			if row[key] != '':
				priceData.append( round( float( row[key]), 2))
			else:
				priceData.append(0.0)

			places.update({county_name: priceData})

		except ValueError:
			print(sys.exc_info())

Then, if it already has a county we pretty much do the same thing, without creating a new county.

else:
	print('{} Already exists.'.format(county_name))
    try:
    	placevalues = places.get(county_name)
        for i in range(len(datakeys)):
        	if row[datakeys[i]] != '':
            	placevalues[i] += round(float(row[datakeys[i]]), 2)

        places.update({county_name: placevalues})
    except ValueError:
 		print(sys.exc_info())

Finally, this is all output to a CSV file.

def writeData(places, outputfile):
    with open(outputfile, 'w', newline="\n", encoding='utf-8') as output:
        writer = csv.writer(output)
        writer.writerow(['County', 'Order Value', 'Discount Amount', 'Shipping', 'Sales Tax', 'Order Total', 'Card Processing Fee'])

        for county in places.keys():
            rowdata = places.get(county)
            rowdata.insert(0, str(county))

            writer.writerow(rowdata)