Generating Leads and Sales with Open Data

1 June 2017


4 min

We have a problem! Sales are slipping at Dunder Mifflin and we need a way to find more customers. How the hell do we do that… 🤔

Dunder Mifflin sells paper. In the modern era, paper is being used less and less. While this is good for the environment, it’s bad news for paper companies. There’s still paper to be produced and sold, just less of it. So, companies like Dunder Mifflin will need to work harder and harder to continue to earn their share of the market.

We’re going to look at how this company can increase and improve sales using open data.

What is Open Data?

To quote the Open Data Institute:

Open data is data that anyone can access, use or share. Simple as that.

The ultimate aim of open data is to promote a better world. The sharing of data can reduce corruption within governments, accelerate scientific advances and save millions of lives.

You might be wondering what this has to do with selling paper? Well, with the advent of the Open Government Partnership, we now have improved access to data on UK companies (our target customer) through Companies House (CH). Access to this data in the past was limited to those who could afford the paid subscription services. You could purchase curated lists from external companies too, but again, this cost.

With fair access to data, we now have a level playing field.

How to Use Open Data

You use open data just like you would ‘normal’ data.

Step 1: Define a Goal

We first need to define what we want to achieve as the end goal:

Step 2: Identify Possible Solutions

Now, we need to see how we can achieve that goal:

Step 3: Eradicate the Non-Data Solutions

Now, we have to remove the solutions that cannot be done with data:

Step 4: Determine Data Availability

Now, for the possible solutions, identify what data is required and where to acquire it:

Solution Required Data CH Provides?
more leads contact details
higher quality leads lead data, e.g business size
new geographical areas lead location
enter new markets lead industries or fields

The common theme amongst the various solutions is to focus on refining or expanding the lead generation process and acquiring those leads’ contact details. To do that, we require two pieces of information, which we can acquire via Companies House:

Step 5: Build the Solution

At this point, we’ve identified what data we need and confirmed that we can access/acquire it. We now need a method for processing the data and generating leads.

We have a two options:

The choice between the two options depends on a variety of factors, including the size of the dataset, update frequency, team access and value to the business. A technical advisor can help with making that decision.

Building the Solution

We get our (open) dataset from download.companieshouse.gov.uk in one CSV file (~2GB). CSVs can be opened in spreadsheet programs, but at this size, it’s too much to handle. We could look for an off-the-shelf solution, but for a task this small, it’s easier to roll our own. All we need is a scripting language to sift through the data and extract the records that match our search criteria.

As an example, we’re going to target solicitors.1

To generate our leads, I’ve wrote a small script which looks through the entire dataset and extracts all active solicitors residing in Sheffield.

import csv

with open('open-data.csv', 'r') as data:
    reader = csv.reader(data, delimiter=',', quotechar='"', skipinitialspace=True)
    for ndx, row in enumerate(reader):
        if ndx == 0: # Get Headers
            idxCoName  = row.index('CompanyName')
            idxStatus  = row.index('CompanyStatus')
            idxSIC1    = row.index('SICCode.SicText_1')
            idxLine1   = row.index('RegAddress.AddressLine1')
            idxPC      = row.index('RegAddress.PostCode')
            idxTown    = row.index('RegAddress.PostTown')

        # Filter on Status (Active)
        if row[idxStatus] == 'Active':
            # Filter on Post Town (Sheffield)
            if row[idxTown] == 'SHEFFIELD':
                # Filter on SIC (69102 - Solicitors)
                if '69102' in row[idxSIC1]:
                    print('{}\n{}, {}, {}\n'.format(row[idxCoName], row[idxLine1], row[idxTown], row[idxPC]))

This script takes around 30 seconds to run and outputs the following:

...

BUZZ LAW LIMITED
7TH FLOOR 2 PINFOLD STREET, SHEFFIELD, S1 2GU

CARTER THOMAS LIMITED
ELECTRIC WORKS, SHEFFIELD, S1 2BJ

CFA LAW LTD
FOUNTAIN PRECINCT MEZZANINE FLOOR, SHEFFIELD, S1 2JA

COATES SOLICITORS LIMITED
62-64 HIGH STREET, SHEFFIELD, S20 5AE

EMILDA MORGANS LIMITED
C/O WOSSKOW BROWN THE JOHN BANNER CENTRE, SHEFFIELD, S9 3QS

...

As can be seen from the output above, a list of potential leads has been generated, complete with their name and address. This list can then be given to the sales team.

There’s a limited number of businesses in Sheffield, so once all the leads have been exhausted, it’s time to change the search. That could mean changing the area, e.g. Manchester or Nottingham, or changing the selection criteria, e.g. targeting insurance brokers.

Conclusion

While I’ve used a basic example with a fictional business and kept things simple, I hope to have shown the potential for using open data within a business. The data doesn’t have to be ‘open’ either, it can be closed-sourced (proprietary). Ask the right questions, and valuable insights can be seen in most data sources. The challenge however is finding the right questions to ask…

  1. I would imagine a large amount of paper is involved in this business.