share      tweet      post

This is the first of two posts on Intellectual Property data in preparation for a hackathon at Manchester Library on 12th March 2016. This is an overview of the datasets and process for mapping currently available IP data.

To take part in the event at Manchester library visit EventBrite - IP Magnet Challenge.

Intellectual property (IP) applications (patents, trademarks) are handled in the UK by the Intellectual Property Office (IPO). This data has for a long time been publicly accessible, through tools such as the IPO patents and trademarks journals.

There are also various developer/consumer services for access to this data. Weekly updates to the Patent journal are published in XML or PDF format.

For data analysis, these services aren't ideal - web seaches find specific information, rather than data. For patent applications, weekly XML would also have to be processed to be turned into a current snapshot of the full data, and would ideally be in a more accessible format for non-developers.

There are a couple of datasets listed on Gov.uk, where the data has been compiled into snapshots in order for statistical research to be undertaken. A full description of the fields is available on the pages for each dataset.

Licence: The data is released under an Open Government Licence.

Patents data

The snapshot data for patents includes postcode districts for each application - for the applicant(s), and the current proprietor(s).

Column Description
Applicant/s at application The name of the applicant/s first recorded for the given application. Applicants separated by ;
Postcode/s of applicant at application Postcode/s of the applicant/s at application (note this only contains the area code and present for UK applicants). Postcodes are separated by ;
Current proprietor/s The name of the current proprietor/s of this patent. The proprietor could be the same as the applicant but will differ if the applicant has changed name or ownership of the patent application has changed hands. Proprietors are separated by ;
Current proprietor/s postcode/s The postcode/s of the current proprietor/s. Postcodes are separated by ;

There are 523,367 records. Of these 159,913 have postcode information, representing the number of applications from the UK (around 30%).

Analysing location data

Comparing the two columns for applicant and current proprietor column, there are only 34,239 records where they are different, around a fifth of the total.

As there is not much difference, taking just the applicant postcode district data, this post will visualise the number of applications coming from different postcode districts. For geographic visualisations, choropleth maps can be useful to visualise differences in areas. This is colouring areas based upon a value for that area (in this case - number of applications).

Postcode districts

The IP data gives the postcode district rather than full postcode. What is a postcode district? Postcode data is split into different levels.

Level Example Description
Postcode area M The area is identified by one or two alpha characters at the start of the full postcode, the letters being derived from a town, city or district falling within the postcode area. There are, at present, 120 postcode areas in Great Britain.
Postcode district M2 A sub-area of the postcode area, specified by the character sub-string within the first half of a full postcode. There are approximately 2,800 postcode districts in Great Britain.
Postcode sector M2 5 A sub-area of a postcode district, whose area is identified by the number third from the end of a full postcode. There are approximately 9,000 postcode sectors in Great Britain.
Postcode unit M2 5PD An area covered by a particular postcode. Postcodes are an alphanumeric abbreviated form of address. Postcode units are unique references and identify an average of 15 addresses. There are approximately 1.7 million postcode units in the UK.

(Data abbreviated from Ordnance Survey Linked Open Data Ontologies.)

Postcode open data

Ordnance Survey have a free open data product called Code-Point Open, which provides coordinate-based location information for all 1.8M postcode units.

However, for choropleth maps we need polygon data - areas that define the boundaries around particular postcode districts/areas/sectors.

This data is not freely available from the OS, as it is included in an OS premium product:

It is worth nothing that public sector organisations should have free access to OS premium products as part of the Public Sector Mapping Agreement. Any public libraries or other public organisations that want access to OS premium data should contact their GIS teams.

Instead of using premium OS data, the boundaries can be derived from the open data. If you were to plot all of the postcodes in the UK on a map and then draw a line around all the 'sets' (whether that set is grouped by area/sector/district/unit) then those shapes are effectively the polygon data.

Fortunately this has already been done by a geospatial organisation called Geolytix.

Licence: Released under the same terms as the OS OpenData license with the single addition of GeoLytix to the attribution list.

Carto

Having found the relevant data sets, Carto is an online tool to map and analyse location data. A free account (restricted by 250Mb limit) is generally enough to do a lot of visualisation work.

Both the Geolytix open data, and the Gov.UK snapshots of IPO data can be directly uploaded into Carto as datasets. These can then be joined (using the postcode district in both sets), and manipulated using structured query langugage (SQL). The Carto docs are the best places to get started if you want to delve into mapping data within the online tool.

Filtering data to Manchester

Filtering data can aid in display, as any kind of interactive visualisation will perform better with a smaller set of data. Wikipedia lists the districts within the M postcode area:

The map below shows the relevant postcode districts in the M postcode area, coloured to show the number of IP data applicants. Clicking on a particular area will show some more detailed information.

Contains OS data © Crown copyright 2016, and GeoLytix open data polygons


comments powered by Disqus