Placr News

February 14, 2012

The alternative briefing for the new Data Strategy Board Chair

Filed under: Uncategorized — Jonathan Raper @ 9:53 am

Dear DSB Chair,

This is an alternative briefing to the one that you will be offered by civil servants in Dept. of Business, Innovation and Skills (BIS) after your appointment. Doubtless you have already been visited by @SirBonar who will have impressed upon you the dangers of talking to the press, industry, academia or the open community, but I have a mole who will have slipped this paper into your pile. Which is why you are reading this now…

Let’s start with your aims, which are to:

“seek to maximise the value of data from the Public Data Group (PDG) of Trading Funds for long-term economic and social benefit” para 11 Autumn Statement 2011

Obviously, as Public Data Group (PDG) will be made up of Trading Funds that report to Shareholder Executive and Ministers at BIS, you will have no influence over PDG, who will resent your meddling in their affairs. Should you spurn their attempts to “clientise” you (plan A), they will move on “death by consultancy” (plan B) in which spurious evidence will be presented to you masquerading as independent thinking. You will probably expect to dodge these hazards by commissioning your own research with some of your budget. This will enable you to define “long-term economic and social benefit” not as “a strong civil service with profitable trading agencies” but as “reduction of costs across the economy, creation of 21st century digital infrastructure and freedom of the digital commons”.

Next, you will have to be concerned about some of the members of your committee who will be there to keep you “on message”. After all 70% of the committee will be from the public sector, who have absolutely no need to release any data unless Ministers insist. Remarkably, the committee will be allowed to have ‘data users from outside the public sector’; however, they will also have been carefully chosen. Like all committees that are constituted to ‘do good’, you will be given just enough members (15-20?) to ensure that you cannot make bold decisions. So you will probably set up selected sub-committees to take decisions and ensure that their minutes are ‘taken as read’. This should enable you to lasso some core reference data without the sleepier members of the committee noticing. You will have to be careful with this ploy: the PDG will carefully ‘descope’ any data you want to release. The Ordnance Survey were forced to release 1:10,000 maps as open data in 2010, but they cunningly removed the field boundaries and contours from the final released data.

Of course you will have some money to spend on freeing data. However, the trouble with using DSB money to “buy” data from the PDG is that taxpayers have already paid for it: after all there is a government ‘first purchaser’ for all data created by government. So if you spend some of the PDG profits on data, you will be spending taxpayers money for the data a second time… and sooner or later the Taxpayers Alliance will notice this. There is nothing so dangerous for a quango as tempting tabloid headlines about “stealth taxes”. Of course the PDG will be delighted if DSB has a “publicity malfunction” along these lines, as this will strengthen their hand with Ministers. So the trick here is to use your money in really cunning way e.g. to buy out system integrators who have exclusive contracts to handle public data e.g. Texunatech who distribute Edubase with restrictions.

You will also have rivals for Ministers’ affections, and you will have to watch your back carefully. There is the Advisory Panel on Public Sector Information which almost the same terms of reference as you do, but belongs to the Ministry of Justice. Oh dear, then there is UK Location which is “a UK pan-government initiative to improve the sharing and re-use of public sector location information”, which is in DEFRA. Hmm, then there is the Office of Public Sector Information, which is, er, also part of the Ministry of Justice. And then there’s the Office of Fair Trading Markets Group who published a report on “Commercial use of public information” in 2008. And Consumer Focus have just launched their Online Public Services Manifesto that calls for government to “Publish public information in ways that make it easy to be re-used”. Not to forget the Information Commissioners Office whose “mission is to uphold information rights in the public interest, promoting openness by public bodies and data privacy for individuals”. Carving out a niche is going to take some creativity… so rigorous branding, web identity and mission statements will demand high quality production values and will need to be widely circulated, preferably on paper as decision makers prefer it that way.

Finally, beware those pesky MP’s, currently debating the Protection of Freedoms Bill in Parliament. If they enact this bill unaltered, then para 102 extends the Freedom of Information Act to allow requests to be answered as data and not just PDFs of documents. Cleverly, this clause in the bill (now in its report stage) also allows public bodies to make a charge for any data that they are forced to release (Section 3, modifiying Section 11 of the FOI Act of 2000). This means that every darn government department will be releasing data and charging for it without going through the Public Data Group in BIS. Unless PDG get the data, you can’t tax the PDG dividends for the DSB budget, which would be a disaster.

So, what then can you plan as your legacy? Since the City, corporations and the Open Community will not see you as representing their interests… perhaps a high objective would be the unification of government data regulation into a single quango called OFDATA. Play the long game… Apple and Google will likely force change on the PDG agencies with extreme prejudice, and so the DSB budget may not last all that long. But creation of an integrated regulator for data would be a real legacy and one that would always carry your name.

Best wishes

Jonathan

January 17, 2012

What the Minister said to us about #opendata

Filed under: open data — Jonathan Raper @ 5:52 pm

Via the good offices of @julianhuppert (Lib Dem MP for Cambridge), today I was able to meet Ed Davey (Minister responsible for #opendata) at Dept. of Business, Innovation and Skills (BIS) with an Open Rights Group-led delegation of Chris Taggart (@countculture), Harry Metcalf (@harrym) and Jim Killock (@jimkillock). We asked to meet him to put across the views of developers and SMEs on the government’s new institutions for #opendata, which seem to be putting up a paywall around the core reference data of maps, land records, addresses, company data etc, even though there are government first purchasers of the data.

NOTE: this was an on-the-record meeting with a Minister, but these are my notes and the minister’s office or ORG might have a slightly different record.

The meeting opened with some arguments from our side about the democratic importance of opendata, how we believe it could be opened further without significant costs (given that internal government trading makes up most of the spending) and the importance of #opendata releases as a way to reduce the fricion of charging around data. When the Minister spoke his view was strikingly close to the arguments in the Open Data Measures policy statement, in that he strongly defended the proposed new institutions for #opendata, the Public Data Group (PDG) and the Data Strategy Board (DSB) from our counter-arguments. He said, for example, that OS maps were excellent by international standards because it trades its data and that this trading strengthens its focus through market challenge. He said that Met Office were releasing more data than any other meteorological agency in the world under the current government’s policy, and that Companies House had reorganised because of the digital revolution and data releases. He argued that the PDG would drive efficiencies through de-duplication of back office capacity in the agencies and that DSB would release further open data by wise supervision of the £170m government budget for public sector data purchases.

Given the government’s stellar track record on releasing data so far I was really hoping for the Minister to be a little more pro #opendata, pro agency reform to achieve wider transformational change. As it was, he defended the principle of government monopoly trading of data, and argued that free data can co-exist with chargeable data despite, in our view, the damage this does to the investment climate. He also mentioned that he saw freemium models as ‘very interesting’, whereas we would like to decouple free from the ‘mium’ (which would be consultancy around the free data).

We counter-argued against these views, and I think we gained some traction with him. We pointed out that full release of data is transformational, removing the friction of monopoly government trading from the market. For example it takes Royal Mail, OFCOM and Ordnance Survey to administer and regulate trading in the national list of addresses: this is very inefficient by comparison with open release. As we represented three of the small businesses meant to be delivering the government’s vision for #opendata, we pointed out that we find it hard to get investments in the current situation where the current charging regime for core reference data represents a tax on all UK digital service transactions. Other countries are releasing their data at a rapid rate (following our example, and learning from our mistakes) and the UK’s advantage as the best place to build an #opendata business will be lost in months not years. Finally, we argued the case on affordability: most of the money spent on core reference data is spent by government itself (e.g. 83% of Met Office revenues, 58% of OS revenues), and so giving the data away free means that government departments don’t have to buy data… and the money freed up from this spending can directly fund the agencies to release the data in the first place.

On this last point it was interesting that the Minister and Shareholder Executive (present to advise) had both read my blog on affordability of #opendata. They disputed the figures I have taken from the four agencies’ Report and Accounts which I argue shows the releases can be cost free. We have agreed to further discuss the ways in which they think costs will arise once releases are done and internal trading ceases. It may be we are using different language for the same thing: we know it costs money to collect the data, but we want the government first purchaser to pay for this bill without using trading to allocate the funds… this is not extra spending. Achieving this will involve some work in government (by the DSB?) to specify each agency’s public task and to allocate costs where they necessarily arise. But there should not be any new spending in our scenario eg opening Land Registry’s registers to create new services while retaining charges for stamping land documents is perfectly possible.

The minister wanted to put the ball in our court by asking the open community to say where the new revenues to come from if data is released. He said that the numbers in Rufus Pollock’s report on the economics of open data were based on ‘heroic assumptions’. This is the real challenge to the #opendata community: are we arguing that we can make more money from data than government monopoly trading by agencies? No, not in the short term: this is digital infrastructure and has along payback period across millions of people and businesses. Our modest startups will not out-trade the Ordnance Survey any time soon. However, if we allow managements of the data agencies to frame the debate on this question, the government will never release any further data. As Chris Taggart eloquently put it… did we stop and ask how the Internet was going to pay for itself? No: we allowed researchers and businesses to innovate… and the Internet has transformed our economy and society in a decade. Open data on registers, services, performance, places etc is the next phase of digital infrastructure. Without it, we will not have the liberal home market that creates the next great digital businesses in the UK, and if we apply short-term cost-benefit arguments the government will never act to liberalise in the way we need to. If anyone doubts that profound change that is coming to digital infrastructure, look at the influence of the smartphone app stores in publishing, newspapers and transport. This is the context for the need to frame the question about benefits of open data in an entirely different way around infrastructure, a new transparent politics and new business models.

So this is a dangerous moment: allowing public sector managements holding monopoly government trading rights to define the public interest in open data will seriously undermine the fantastic progress this government has made on open data. We’ve made significant investments in Placr to try to create one of those ‘next gen’ businesses. We have been today to put our case to the Minister. Only time will tell if he seizes this opportunity to reform, deregulate and deliver the environment we need to innovate with #opendata.

December 16, 2011

What it will cost to free the rest of UK government data (spoiler: £0)

Filed under: Comment,open data,Thinkpiece — Jonathan Raper @ 1:50 pm

First, the good news. The UK government has made good on its promises to release open data across government in 2011, and this year has seen a dizzying sequence of open data announcements, most recently in the Open Data Measures in the Autumn Statement. Not only has the government opened the data, but it has put in place institutions (like the Transparency Board), portals (like data.gov.uk) and funding (through Technology Strategy Board). This is all profoundly good news and has enabled the growth of a cadre of open data companies like Cycle Streets, Open Corporates and my own company Placr. We are racing to build new companies built on the open data and we are already paying taxes that go back into the Exchequer, offering free services to the public and value-added offerings to businesses.

However, there is still a cloud on the horizon. Some of the most important reference data is still locked up like the detailed maps, addresses, land records, school databases, the national planning application register and court records (details in my blog post here). The government held a consultation in the summer over the formation of a Public Data Corporation (PDC), and we presented arguments as to why embedding a government trading monopoly at the heart of open data was a bad idea, and this seemed to resonate. However, when we read the government’s Open Data Measures in the Autumn Statement we were very disappointed to see that all they have done is change the acronym from Public Data Corporation (PDC) to Public Data Group (PDG), and kept the substance of the previous proposal. This leaves us with a problem, as we are not going to be the “world leader in open data” as George Osborne wants by taxing every digital service transaction in the UK for the core reference data that the government has already paid for!

To understand why this is happening and how to fix it, we need to see what the Autumn Statement is proposing to do. I have drawn the following diagram after a brainstorming session at the Open Rights Group to show the government’s plan:

OpenDataPlans-AutumnStatement11-plan1.png

The Public Data Group will be a merger of the Land Registry, Meteorological Office, Ordnance Survey and Companies House. Analysis of their 2010-11 Report and Accounts shows that the agencies collectively have revenues of £741m, costs of £649m and that they make profits of £92m. We are told that this trading is necessary to save the taxpayer the cost of these activities. However, when you realise that 83% of Met Office and 58% of Ordnance Survey revenues are from government itself, you can see that these costs are mainly being paid by taxpayers anyway. The non-government sales income from these two agencies (MO/OS) is only £84m. Companies House and Land Registry are different because they operate registries and manage transactions for business and house buyers, and so government usage is low, with users paying all the costs.

In the rest of the diagram you see how the government proposes to add two new agencies into the mix to moderate the operation of the PDG government trading monopoly. The DSB will take some dividends from the PDG trading operation and will be able to spend this on buying data to be freed, or on services. However the scale of its suggested funding is orders of magnitude lower than the costs freeing all the data outright and so its influence will be marginal unless it also directs the PDG business plan. In this system the taxpayer would be paying twice for the data: once for the core operations through taxes and then a second time to free the data through the DSB dividend income, which the Treasury would forgo. Meanwhile the Open Data Institute will spend its money on research over the long term… and cannot influence the trading activities.

Surely this is the wrong model for the future of government data as the paywall around the core reference data will continue to inhibit private sector investment and reduce the tax revenues from innovation. There are too many new players in the system, all incurring costs and adding friction to the movement of the data. I think the government should instead release all of the data freely to stimulate private sector growth, as I show in this diagram:

OpenDataPlans-AutumnStatement11-prop2.png

If government releases the data freely and encourages the agencies to do consultancy and packaging of the free data, these costs of these agencies will have to be (or are already being) restructured as follows:

  • The Met Office has already adopted this plan by releasing most of its data in the Autumn statement… it will lose most of its data revenues on the one hand (83% of costs?) but government will not need to pay for the data on the other hand. If the Met Office makes 17% costs savings on operations OR raises 17% extra consultancy charges on the freely released data (=£33m), then its data releases are revenue neutral for government. The Met Office Business plan has clearly been revised to cover this change as is is not asking for more money (NET EXTRA COST: £0);
  • The Ordnance Survey needs to release its data freely so government doesn’t need to pay its current £74m bill for map data and instead it should use this money to fund OS directly: this change would be revenue neutral. The OS would lose its £52m of consumer and business income from selling data, though it would be likely to earn some revenues through consultancy on packaging and delivering the released data, say £10m. So the extra cost to the taxpayer would be £42m if nothing else changes, though in reality the OS would be able to slim down and would not need all of its current 131 Sales and marketing staff. This could lead to savings of another £10m (NET EXTRA COST: ?£32m);
  • Companies House is proposing to release its key data in the Autumn statement (more details needed!), so it looks likely to have to reduce its cost base and develop its services income as the private sector adds to, and replaces its services with new ones based on the open data. As it has already moved to open its data, this change is clearly now in its business plan (NET EXTRA COST: £0);
  • The Land Registry is actually the biggest question… it can and should open access to its cadastral (land) records freely, but the government still need to supervise and assure land and property transactions. There is a straight choice here: pay its £281m costs out of government funds and give us all a tax cut (great stimulus when the economy is flatlining)… or carry on taxing land transactions (possibly in new and different ways?). This is not an open data question, as the data can be released at some limited cost while the government still charges for the transaction assurance (NET EXTRA COST: ?£5m).

This analysis (Excel spreadsheet here) implies that the only real shortfall on full release for these agencies would be the loss of OS non-government income of £32m net of savings on business-as-usual, plus ?£5m for the Land Registry to give open access to its data (=£37m). To keep the costs low the Land Registry could keep its registration income, but provide access to its land records as open data.

The moves by the Met Office and Companies House show that the government believes that agencies can release their data, cut their costs and still deliver core data. Therefore I don’t think free data release is “unaffordable” in any sense, and the Ordnance Survey also needs to be restructured to deliver this change in a broadly revenue neutral way. For example, it could close services that duplicate the private sector such as the ‘Get a Map’ service. Crucially, amongst the OS data releases would be AddressBase… the full UK national address database, which is a vital open dataset needed to power almost all digital services. The overall value to the economy of removing monopoly pricing from detailed maps and addresses would be a marginal increase in economic efficiency across a vast range of transactions.

I have shown here how the costs of these four agencies are either internal government trading, being cut voluntarily or being levied as transaction costs (which could remain while the data was opened). However, in theory losing the £92m profits of these agencies would also be a cost to government funds. On closer analysis… although MO, OS and CH returned £15m to the government, Land Registry had to pay a massive £87m for restructuring costs and returned no cash to the government. So if we add the “loss” of the £15m profit to the costs of releasing data, then the total bill for releasing all the data and losing profits could reach £52m on a business-as-usual scenario, though restructuring OS can probably eliminate most of this cost. If LR continued to trade and charge for transactions (whilst opening its data), then in a ‘normal’ year its gross profits of £69m could actually pay all the bills for data releases elsewhere!

So… given the “unaffordable” costs of data release are actually a mirage… the government should be acting boldly to release all the data and stimulate the economy. Therefore the only remaining job that the government has to do to get the final open data #WIN is to reform these agencies. As far as I can see the only people who still oppose this are the managements of these agencies… and you would expect that wouldn’t you?

Jonathan Raper

November 7, 2011

Our submission to the Public Data Corporation Consultation

Filed under: Announcements,open data,Thinkpiece — Jonathan Raper @ 5:30 pm

Chapter 4 – Charging for PDC information

1. How do you think Government should best balance its objectives around increasing access to data and providing more freely available data for re-use year on year within the constraints of affordability? Please provide evidence to support your answer where possble.

Placr Ltd. published an open letter to Andrew Tyrie MP on our blog which presented arguments in response to this question:

Monopolies are bad; government monopolies are worse

All of the government’s options for the PDC in this consultation envisage the creation of a state trading body with a monopoly over the sale of the most important items of UK digital data infrastructure, viz. maps, land records, addresses and weather data. They are likely to be joined by company records, court proceedings and statistical data after the PDC is established. This data is already collected and paid for by government as part of its public task. In the digital era it is not necessary or desirable for the government to monopolise the distribution of this data: it can be released at marginal cost as open data through data.gov.uk. Monopolies face no pressure on their costs, and they act anti-competitively by instinct… yet almost every digital service business will have to pay a stealth tax to this state monopoly if the government proceeds with its plan for the PDC.

Growth needs free markets

Entrepreneurs can only invest with confidence when markets are free. The presence of a monopoly state body in the market for digital services is bad for investor confidence, and the uncertainty this engenders is hindering SMEs like mine from raising funding to power growth. By taking a significant amount of revenue out of this market in new tax government will make it harder for UK digital service businesses to grow quickly and create jobs. Meanwhile our competitors (think Ireland, Netherlands, USA) are liberalising their markets by releasing this core reference data freely and ‘getting out of the way of innovation’ as Michael Cross pointed out in his excellent article on the PDC in the Telegraph.

A failure to understand digital infrastructure

The government’s consultation document on the PDC fails to see digital core reference data as part of 21st century digital infrastructure. The use of addresses, maps, weather, statistics etc. in digital form is so pervasive in public administration and business that its wide availability and low cost is a necessary component of growth across the economy. The agencies to be brought together into PDC have just created the first common national address database (known as Geoplace). Despite it being a crucial piece of digital infrastructure it is only available as a high cost data product. This will limit the use of a dataset that should be used by absolutely every organisation in the country, with the loss of all the standardisation and efficiency benefits that this would bring.

Charging for transparency

Open data releases have created new services (e.g. live bus departures through Traveline), increased transparency (e.g. Treasury COINS database) and shown that key policy-making datasets (e.g. crime data) have major errors that need fixing. Every one of these open datasets will be dependent on the core reference data that the proposed PDC would control and tax. The government has seriously conflicted its policy on transparency by proposing to create a state monopoly that will hold a veto over data pricing and distribution. As an example, national transport datasets such as public transport performance, accident statistics and traffic counts will be free, but the maps against which they need to be referenced will be charged for by the PDC. New Zealand’s attempt to create a PDC in 2001 led to disaster with the Terralink trading fund going bankrupt without the possibility of a state rescue. The private sector picked up the state assets very cheaply from the liquidator leading to a total loss of the core data from the public sector.

Government trading with itself

In the consultation the government rejects the ‘Data utility’ (aka ‘open data’ publication) option for the PDC as it is “unaffordable” (para 4.16), and it excludes this option from the consultation. The existing agencies are trading funds with a combined turnover of £675m and costs of £585m, yet most of their revenues are from government itself (e.g. Met Office 85%, Ordnance Survey 60%, from their Report & Accounts). These agencies are also spending heavily on business development: for example, the Ordnance Survey Accounts for 2010-11 show that it employs 130 Sales /marketing staff, which is 12% of the total. Removing internal trading and cutting out business development makes the release of core reference data eminently affordable. Some income would be retained in services say £100m and possibly a third of the costs could be saved reducing costs to £410m. Given its importance as infrastructure for key growth industries like digital services, the more important question is… can we afford not to free this data at a cost of c£300m invested in digital infrastructure?

2. Are there particular datasets or information that you believe would create particular economic or social benefits if they were available free for use and re-use? Who would these benefit and how? Please provide evidence to support your answer where possible.

Some of the most important datasets created by the government as part of its public task are still not released or subject to charging or access constraints. These datasets could be powering growth but government monopoly producers are blocking release by pursuing their institutional self-interest rather than the national interest. The attitude of many data owners in government is that: “we like the control that our data monopoly gives us, we keep the (relatively small) amounts of money we make, and no one can ask awkward questions about what we do.” This must end if open data is to reduce data costs across the economy and to enable SMEs to start new businesses powered by open data.

The most important datasets/services that would produce economic/ social benefits are:

- GEOPLACE, comprising addresses, postcodes, national land and street gazetteers. This data is critical to the success of re-use of almost all of the rest of the government’s open data as every activity has to be referenced against these datasets. The sheer irony and self-contradiction of charging for the core datasets and releasing other dependent data openly is astonishing as Geoplace is the core of the UK’s digital data infrastructure. This is the most important reference dataset held by government: growth will be taxed and constrained in every other area by retaining this behind a paywall. Each of the datasets below viz. NPAR, CH, Edubase and Trust need Geoplace to be reused as does transport, streetworks, health and crime data. Placr is forced to use OpenStreetMap to publish bus departures from Traveline and to support bus stop defect reporting by travellers.

- NATIONAL PLANNING APPLICATION REGISTER, comprising a record of all planning applications to local authorities and aggregated by the government’s Planning Portal. This was operated as a 5-year pilot through private contractor Emap Glenigan from 2005-10. When the service was closed down no attempt was made to offer the service as open data. Planning applications are of considerable public interest and there could be significant innovation in property searching and valuation if this data was made freely available as open data. Placr is looking for opportunities to aggregate and freely distribute open data where there is actionable information, as in this case. We have published our business models for this type of data in a blog post “The growth case for open data”

- COMPANIES HOUSE RECORDS, comprising details of all private companies that are collected by law. The data is behind a paywall and there is no open API. This is preventing the state register becoming the dominant way to reference companies as a commercial competitor Dunn & Bradstreet DUNS is charging both government and commerce for use of its DUNS number. This is a classic case of government failing to return benefits from fulfilment of its public task.

- EDUBASE, comprising the register of educational establishments in England & Wales. This database is available for reuse under the OGL, yet bizarrely you have to pay for more than two database extracts per year: “Under the Power of Information framework, public and commercial users are entitled to two free extracts per year, after which there is a service fee”, from FAQ. Placr already publish Schoolbrowser with attainment data, but we need to be able to add data from Edubase to add sufficient value to be able to make a return on the data.

- TRUST, comprising all Network Rail short (<24 hours), medium (weekly) and long term timetables and real-time running data as exclusively sub-licensed to the Association of Train Operating Companies. This data is produced as part of Network Rail's public task funded by the taxpayer and is already distributed to the rail industry through the Network Rail External Services Gateway and TDnet. Without this data no independent scrutiny of rail performance can be undertaken. Placr already publishes a national bus timetable service at placr.mobi with a social media feed for every bus stop in the UK. We cannot turn this into a free national public transport app without access to open rail departures, and this omission restricts the value addition we can achieve.

- COURT LISTINGS, comprising details of court cases in the Royal Courts of Justice, Crown Courts, selected County Courts, along with Case Archives and Legal News. This data is published via exclusive licences through Courtel and Bailii but is not available as open data. Thus it is impossible to compare open crime data with court outcomes without paying to access one of the licensed services.

3. What do you think the impacts of the three options would be for you and/or other groups outlined above? Please provide evidence to support your answer where possible.

The consultation document identifies five charging options and then dismisses two of them (‘Data utility’ and ‘profit maximisation’ models) leaving only models based on charging. As the ‘Data utility’ model is equivalent to the open data model this would leave the extra-ordinary situation of core reference data behind a paywall and most other government data released openly, but dependent on the data behind the paywall. Placr are concerned that the dismissal of the ‘Data utility’ model on the basis of affordability (§4.16) has been done on anecdotal evidence from 4 unrepresentative Cabinet Office ‘data user seminars’ who were asked ‘leading’ questions. In §4.3 a 146 page Treasury-commissioned Cambridge University study is set against “anecdotal evidence” in a way that reflects very badly on the methodology of the Consultation. If the authors of the Consultation wished to challenge the conclusions of the ‘Cambridge Study’ then they should have presented evidence of equal weight and detail to support their argument. In the circumstances the exclusion of the ‘Data utility’ option from consultation is a serious mistake without sufficient justification, and Placr believe that it should be examined formally as part of the Consultation outcome, despite the position taken in the document.

As argued in the open letter on the PDC quoted above in (1), all charging options will damage growth opportunities for re-users of government data by entrenching a monopoly state agency at the heart of the UK’s digital infrastructure. The PDC will need to find business models to sell government data already collected and paid for as part of the public task so that it can cover its own costs: these charges will be an additional tax. Yet there will be no market mechanism to bear down on costs and prevent excessive pricing, and we do not believe that the Shareholder Executive can provide market-equivalent disciplines.

There will also be a structural governance problem in the PDC if it is set up to charge for government data collected as part of the public task. Under any of the charging models, the PDC management will supervise the market as the monopoly provider. This model has already caused conflict between the private sector and trading funds as monopoly providers conduct ‘beauty contests’ to decide which business partner to select when technology enables new products and services to be created. There has been litigation over licensing decisions eg between Ordnance Survey and GetMapping (http://www.guardian.co.uk/media/2002/feb/25/newmedia). This governance problem creates uncertainty for investors and makes it hard for SMEs to enter the market for data products as the PDC is required to risk its rate of return with unproven new companies. The Trading Funds have rarely been willing to do this and have taken the safe option of cooperating with larger corporations. In the Data Utility option, SMEs take the risks of innovating with open data using their own capital to create new products and services.

If the PDC is allowed to charge for data using Freemium models there will also be arbitrary divisions between data sets selected for free release and to become chargeable, as for example with Edubase, which allows two free extracts per year. Even if the ‘harmonisation of charging’ option were adopted, the sheer complexity of the various datasets would make it hard to achieve consistency.

These problems can only be mitigated by adopting the Data Utility option. This option also offers the growth advantages of a tax cut (eg removing Land Registry map charges from house purchase property searches) and a market liberalisation for digital services. It also eliminates government to government trading by a monopoly state corporation- surely the most economically inefficient mechanism ever created.

4. A further variation of any of the options could be to encourage PDC and its constituent parts to make better use of the flexibility to develop commercial data products and services outside of their public task. What do you think the impacts of this might be?

Allowing the government to operate in the market as a commercial player in digital services would be an unprecedented step for a government that recently attempted to roll back the state with a bonfire of the quangos. In the Trading Fund model of trading outside public task the government would create a monopoly state corporation with no pressure on its costs and a management with no risk capital to provide commercial and market disciplines. In a high profile previous experiment with this model the New Zealand government created a state corporation called Terralink to group together the same agencies as the PDC would have. Terralink went bankrupt after 2 years of operation in 2001 when it could not fulfil a commercial contract because it accepted very stringent terms that no entrepreneur risking their own capital would do. The data assets were sold to the private sector by the liquidator and they were lost to the public sector. With its structural governance problem, a state monopoly like the PDC would be vulnerable to making wrong decisions when dealing with global corporations with massive market capitalisations. Note that the Department of Transport has already given Google access to real time traffic data on a privileged basis on the basis that Google can reach the market with its services. The PDC would be likely to make such deals again even though the long-term security of the government’s data would much enhanced by a large number of SMEs all innovating with open data using their own capital.

Rather than create a state corporation to operate a digital services monopoly the government should do the opposite: release the data as open data, following the excellent example of a number of publicly funded bodies. Hence, NHS Choices releases a large amount of health data to a heterogeneous collection of corporations and SMEs. Traveline, the national public-private bus information partnership releases its bus departure information freely and there are now 50+ apps for smartphones available across the country remixing bus departures with other data as for example placr.mobi does with social media updates. London Underground releases all its live tube running information freely and has struck a sponsorship deal with Microsoft for free use of its Azure cloud services platform to power real time distribution to dozens of apps, like for example, Busmapper and UK TravelOptions, which use Placr data-as-a-service solutions.

5. Are there any alternative options that might balance Government’s objectives which are not covered here? Please provide details and evidence to support your response where possible.

Placr offered an alternative vision for the PDC in the conclusion to its open letter, as follows:

An alternative vision for the PDC

The government is currently allowing an administrative reorganisation to set the agenda for 21st century digital infrastructure planning. However, we should not be listening to civil service managements when we are faced with global power shifts in digital services and media content. The Apple iTunes App store opened just over 3 years ago and it has already restructured the newspaper and mobile phone industries, with publishing and transport next in line. The way to grab the biggest slice of this massive new global market is liberalisation at home so that the UK is the natural place to base investments. At this precise moment, when we desperately need growth, if the PDC freed the maps, land search charges, company data, statistics, addresses etc. it would be akin to tax cut across the economy *and* a simultaneous industrial stimulus in a key growth area.

PDC plans must be re-thought

Our company is one of hundreds of UK start-ups in digital services that can immediately exploit liberalised core reference data alongside open data releases. We re-use government data in transport, education, crime and health to feed smartphone apps and provide business services. We are an early stage business that creates jobs in London and pays UK taxes, already turning over £120K a year from organic growth after a year and half. We want to expand in the UK and export our expertise. But if we have to pay a tax for the use of core reference data to support a government monopoly, it makes it hard to develop profitable business models around open data and the scope for growth is greatly reduced. This is a decision of huge importance for the UK digital services industry… one of the few new industries that can really get us back to growth.

Chapter 5 – Licensing


6. To what extent do you agree that there should be greater consistency, clarity and simplicity in the licensing regime adopted by a PDC?

Placr view licencing questions as greatly subservient to charging policy and governance. If the Data Utility model is adopted, then the Open Government Licence can be used everywhere. Creating a huge state monopoly to try to protect intellectual property to recover income is exactly the opposite of the recommendations of the Hargreaves Review. One of the great advantages of open data releases is the simplicity of the mission: to release data freely and to enhance the operation of the market, in particular to ensure:

  • access follows public money
  • government itself does not compete with open data users
  • there is equality of access (net neutrality for data)
  • there is stability of distribution
  • the distributing agencies do not act like monopolies
  • 7. To what extent do you think each of the options set out would address those issues (or any others)? Please provide evidence to support your comments where possible.

    Use of charging models necessitates a more commercial licensing regime. Placr’s experience is that these licenses are complex and expensive to administer and enforce. Use of the open data model mitigates these risks and reduces the cost of managing licensing. SMEs in particular cannot afford to go to law and the need to sign licences with the PDC will be a show-stopper for many innovators.

    8. What do you think the advantages and disadvantages of each of the options would be? Please provide evidence to support your comments

    See answer to question 7.

    9. Will the benefits of changing the models from those in use across Government outweigh the impacts of taking out new or replacement licences?

    See answer to question 7.

    Chapter 6 – Regulatory oversight

    10. To what extent is the current regulatory environment appropriate to deliver the vision for a PDC?

    The current regulatory environment around data is very complex as it involves the Office for Public Sector Information, the Office for Fair Trading, Consumer Focus, Passenger Focus, the National Audit Office, the Office for Rail Regulation, the Patients Advisory and Liaison Service, Ofcom, the Advisory Panel on Public Sector Information, the Information Commissioners Office, the Transparency Board and many others. The introduction of a PDC can help integrate these functions and supervise a Right to Data. However, it will be very difficult for a PDC operating in the market to fulfil this role as there will be a conflict of interest. Placr believe that the vision of the PDC as a coordinating and regulatory agency would be served by the open release of government data produced as part of the public task.

    11. Are there any additional oversight activities needed to deliver the vision for a PDC and if so what are they?

    There are some problems of regulation already inhibiting growth from open data. Giving the PDC statutory regulatory powers would help mitigate these issues, for example:

    - approving contracts for commercial processing of government data to ensure that the system integrators who, do this work do not acquire any unintended rights over government data or its distribution

    - reforming isolated examples of data distribution that do not conform to the Transparency principles eg harmonising legal data releases by the Courts Service and Registry Trust, a non-profit body established to distribute court judgments on credit and debt

    - managing data.gov.uk as a national metadata service for data

    12. What would be an appropriate timescale for reviewing a PDC or its constituent parts public task(s)?

    As most departments of government do not have a formal statement of their public task, a first step should be for a PDC to identify this and express it as a set of data and services produced by it and subject to data release. This role can be overseen by Cabinet Office and Parliament. However, Placr believe it is hard to see as PDC fulfilling this role if it is trading in the market.

    October 15, 2011

    An open letter to Andrew Tyrie MP about the proposed Public Data Corporation

    Filed under: Comment,open data — Jonathan Raper @ 5:55 pm

    House of Commons

    London

    Dear Andrew

    I am writing to you as the CEO of a small SME to draw your attention to a government proposal that stands in complete opposition to the growth agenda that you set out in your recent pamphlet ‘It’s the Economy’ (PDF). In a BIS Consultation open until 27/10/11, the government proposes to establish a Public Data Corporation (PDC) out of several existing agencies and to set it up as a publicly-owned corporation to re-sell government core reference data commercially. I would like to explain the several ways in which this is damaging to the potential for growth and seek your support for a rethink on this plan.

    Monopolies are bad; government monopolies are worse

    All of the government’s options for the PDC in this consultation envisage the creation of a state trading body with a monopoly over the sale of the most important items of UK digital data infrastructure, viz. maps, land records, addresses and weather data. They are likely to be joined by company records, court proceedings and statistical data after the PDC is established. This data is already collected and paid for by government as part of its public task. In the digital era it is not necessary or desirable for the government to monopolise the distribution of this data: it can be released at marginal cost as open data through data.gov.uk. Monopolies face no pressure on their costs, and they act anti-competitively by instinct… yet almost every digital service business will have to pay a stealth tax to this state monopoly if the government proceeds with its plan for the PDC.

    Growth needs free markets

    Entrepreneurs can only invest with confidence when markets are free. The presence of a monopoly state body in the market for digital services is bad for investor confidence, and the uncertainty this engenders is hindering SMEs like mine from raising funding to power growth. By taking a significant amount of revenue out of this market in new tax government will make it harder for UK digital service businesses to grow quickly and create jobs. Meanwhile our competitors (think Ireland, Netherlands, USA) are liberalising their markets by releasing this core reference data freely and ‘getting out of the way of innovation’ as Michael Cross pointed out in his excellent article on the PDC in the Telegraph.

    A failure to understand digital infrastructure

    The government’s consultation document on the PDC fails to see digital core reference data as part of 21st century digital infrastructure. The use of addresses, maps, weather, statistics etc. in digital form is so pervasive in public administration and business that its wide availability and low cost is a necessary component of growth across the economy. The agencies to be brought together into PDC have just created the first common national address database (known as Geoplace). Despite it being a crucial piece of digital infrastructure it is only available as a high cost data product. This will limit the use of a dataset that should be used by absolutely every organisation in the country, with the loss of all the standardisation and efficiency benefits that this would bring.

    Charging for transparency

    Open data releases have created new services (e.g. live bus departures through Traveline), increased transparency (e.g. Treasury COINS database) and shown that key policy-making datasets (e.g. crime data) have major errors that need fixing. Every one of these open datasets will be dependent on the core reference data that the proposed PDC would control and tax. The government has seriously conflicted its policy on transparency by proposing to create a state monopoly that will hold a veto over data pricing and distribution. As an example, national transport datasets such as public transport performance, accident statistics and traffic counts will be free, but the maps against which they need to be referenced will be charged for by the PDC. New Zealand’s attempt to create a PDC in 2001 led to disaster with the Terralink trading fund going bankrupt without the possibility of a state rescue. The private sector picked up the state assets very cheaply from the liquidator leading to a total loss of the core data from the public sector.

    Government trading with itself

    In the consultation the government rejects the ‘Data utility’ (aka ‘open data’ publication) option for the PDC as it is “unaffordable” (para 4.16), and it excludes this option from the consultation. The existing agencies are trading funds with a combined turnover of £750m, yet most of their revenues are from government itself (e.g. Met Office 85%, Ordnance Survey 60%, from their Report & Accounts). These agencies are also spending heavily on business development: for example, the Ordnance Survey Accounts for 2010-11 show that it employs 130 Sales and marketing staff. Removing internal trading and cutting out business development makes the release of core reference data eminently affordable. Given its importance as infrastructure for key growth industries like digital services, the more important question is… can we afford not to?

    An alternative vision for the PDC

    The government is currently allowing an administrative reorganisation to set the agenda for 21st century digital infrastructure planning. However, we should not be listening to civil service managements when we are faced with global power shifts in digital services and media content. The Apple iTunes App store opened just over 3 years ago and it has already restructured the newspaper and mobile phone industries, with publishing and transport next in line. The way to grab the biggest slice of this massive new global market is liberalisation at home so that the UK is the natural place to base investments. At this precise moment, when we desperately need growth, if the PDC freed the maps, land search charges, company data, statistics, addresses etc. it would be akin to tax cut across the economy *and* a simultaneous industrial stimulus in a key growth area.

    PDC plans must be re-thought

    My company is one of hundreds of UK start-ups in digital services that can immediately exploit liberalised core reference data alongside open data releases. We re-use government data in transport, education, crime and health to feed smartphone apps and provide business services. We are an early stage business that creates jobs in London and pays UK taxes, already turning over £120K a year from organic growth after a year and half. We want to expand in the UK and export our expertise. But if we have to pay a tax for the use of core reference data to support a government monopoly, it makes it hard to develop profitable business models around open data and the scope for growth is greatly reduced.

    I would like to ask if you could support this call for the PDC to be re-thought, and for the merits of the open data model to be explicitly examined. This is a decision of huge importance for the UK digital services industry… one of the few new industries that can really get us back to growth.

    Yours sincerely,

    Jonathan Raper

    CEO, Placr Ltd

    August 24, 2011

    Pearson release content through APIs

    Filed under: Announcements,API — Jonathan Raper @ 11:59 am
    Pearson developer portal

    Pearson developer portal

    For the last couple of months we’ve been working with Pearson PLC, the global publishing company who own household titles including Penguin, the Financial Times, Ladybird and Dorling Kindersley. Pearson has showed some considerable innovation when faced with the challenge giving so many established publishers sleepless nights: how to make their existing content available to customers through new media channels. With this aim in mind they announced the launch of the Plug & Play Platform intended to engage external developers and encourage the use of their data in many more diverse ways than they could hope for if handling all development internally. To achieve this, they needed an API.

    We’ve been collaborating with the Plug & Play team towards the launch of APIs for three of their datasets, now live at the Pearson developer portal. The three datasets currently released are:

    • Dorling Kindersley’s Eyewitness London Travel Guide: information about things to do, places to see, where to go out, eat and sleep when visiting the UK capital. The data is the same that forms their well known guide books. The API allows search based on category, location, and free text search.
    • Longman Dictionary of Contemporary English: definitions, illustrations and pronunciations searchable by category, word or part of word.
    • Financial Times Press: searching over 500 articles on business, management, marketing and finance.

    Its been a great experience to be part of opening up such rich content to developers. The API management system is provided by Apigee, and we at Placr developed the API functionality using the ruby on rails framework. A big issue for any publisher considering releasing assets historically released in print is the granularity at which to expose the data: should a fragment comprise an entire book? A chapter? A paragraph, sentence or individual word? Considering the granularity developers need, and transforming datasets into appropriate structures, has thrown up some interesting challenges. Being flexible whilst keeping the API simple to use have been guiding principles. We’ve also worked hard to make the content available both in structured machine readable formats (XML, json, json-p) and as a content API, that can be searched and browsed by anyone with an API key using a web browser.

    Show Me London logo

    Show Me London Android app

    Apps powerered by these APIs are already starting to appear. Developers Metia
    have released the Android app Show Me London based on the Eyewitness Travel Guide API, and Tigerspike have written a blog post describing the development of their app, built on the FT Press API.

    Reaction to Pearson’s Plug & Play project has been very positive. Kin Lane on his API evangelist blog said:

       “the Pearson Plug & Play team has done a great job with their pioneering efforts in this space … The team launched with a diverse set of APIs that not only provide valuable content to developers, but also gives Pearson a good place to practice when it comes to serving up content via an API.”

    The reaction to the APIs on twitter has been illuminating. Phaseit asked the rhetorical question:

       “Did @pearsonplc just prove itself an order of magnitude more progressive than I believed?”

    Yep. I think maybe they did.

    July 20, 2011

    The growth case for open data

    Filed under: Comment,open data,Thinkpiece — Jonathan Raper @ 12:58 pm

    Open data releases (part 1)

    In part 1 of this story significant data releases occurred in 2010 via data.gov.uk and the London datastore and a huge number of new web sites and apps have been set up to leverage open data. However, typically these data sets have been the ‘low hanging fruit’ of static reports, maps and statistics. Only a few of the ‘high value’ data sets have been released that contain ‘actionable information’ e.g.

    If you look at data.gov.uk in detail you will find that it is stuffed with organograms and spending declarations, and that many of the datasets that started publishing have already ceased (e.g. A&E activity. Most government agencies still seem to be operating ‘user pays’ policies when you get beyond headline free data releases e.g. Meteorological Office data services. So while there are some notably early successes, there is clearly much more to do.

    The government seems to recognise this and the Prime Minister has just released a letter to cabinet colleagues in which he commits to a further wave of data releases. But hard bitten experience on the GLA Digital Advisory Board suggest that there are still a lot of people to convince. The attitude of many data owners in government is that we like the control that our data monopoly gives us, we keep the (small) amounts of money we make and no one can ask awkward questions about what we do. The challenge now is to show what kind of difference to growth and transparency that open data can make to combat this kind of scepticism.

    Critically, there are still only a small number of organisations dedicated to leveraging open data and building businesses or services:

    This is pretty much the whole list at present. In my view this is largely because the really important datasets are still not released and the private sector does not yet have the roadmap and the policy stability it needs to invest in the longer term. In short: ‘give us the good stuff’ and we will finish the job.

    So it seems timely to re-state the case for open data and to put the argument in economic terms. This allows us to express the value of open data in terms of new taxes and jobs in the UK. If we can’t make this case, then government agencies are going to continue ‘data hugging’ with all the implications for the success of the government’s open data agenda.

    Open data releases (part 2)

    This is an attempt to work through the case for open data releases from end to end. It is important to look at open data as part of this whole chain… local interests in data often don’t see the bigger picture.

    1. Market failure

    Why do we have public data in the first place? In simple terms, Government has to act when there is ‘market failure’ to:

    • manage infrastructures eg Network Rail, Royal Mail
    • run essential services eg schools, health and policing
    • act as honest broker eg courts or census

    Each of these activities produces data where the government is the necessary first purchaser: you cannot run railways without timetables, or deliver mail without postcodes, or run public services without operational data such as RAISEonline for schools.

    2. Free re-use

    Datasets produced by government where it is a necessary first purchaser should be released under an open government licence i.e. be free for re-use with the minimum of processing beyond internal integrity checks and creation of metadata. This is because the marginal cost of distribution of data is very low. Whenever this is not done, then any data charges are a stealth tax.

    There two limiting cases for government data distribution (and many in-between):

    a) Static data… updated daily or less frequently: this should be available through a download link provided by the departmental data creator and published on data.gov.uk. The extra cost of this to government is very low as the download usage is spread randomly over the day. The extra utility in wide distribution of this data is unlimited.

    If the government restricts access to this kind of dataset and charges for cost recovery e.g. postcodes (currently £4K p.a. for a full licence) then it is simply taxing end users as it has done in this case for 20 years. This becomes an industry and community cost that places UK PLC at a competitive disadvantage compared to other markets where the data is free, and limits market entrants to those able to bear the cost. The distributing agency also becomes a monopoly without any market pressures on its cost base or pricing. The Ordnance Survey was the best example of this for many years until the open community and the press forced a change in its operating model and release OS Open Data.

    b) Real time data from operational systems… generally updated every few minutes or more frequently. This needs scalable distribution outside government as you cannot spread usage like you can with static downloads. In this case the data itself should be freed for re-use by open government licence and government should allow the market to create re-distribution channels from the operational systems. In some cases the market will redistribute for free to all: TfL have done a deal with Microsoft to distribute the tube real-time data in return for the use of the TfL data as a reference case used to advertise its cloud platform Azure. In other cases the market will charge for re-distribution but government may bulk-purchase access to create socially or economically desirable access. Hence Traveline have done a deal with Trapeze to bulk re-distribute bus departures at a fixed price in the Nextbuses API so that end user charges are low and flat for most scales of usage. This kind of distribution allows unprecedented insight into the performance of state-run services and the way the government manages its resources.

    If the government want to charge for the redistribution of this kind of data (or fails to put licence arrangements in place to ensure public money achieves access), then they will continue tax end users as it has in this case since rail privatisation in 1996. Note that Association of Train Operating Companies (ATOC) charges £27K p.a. for timetables and fares data from publicly-funded Network Rail. This cost is passed on to consumers… the cheapest rail departures app for smartphone is c. £3 compared to typical transport app prices of £0.59. The costs are also passed onto the UK rail industry, which has the highest unit costs in the EU as the McNulty Report demonstrates. Monopoly distribution by ATOC has led to arbitrary licence refusals now being investigated through an Office of Rail Regulation consultation.

    3. Innovation around open data

    Data released to the market freely becomes available to drive self-funded community and commercial propositions. Communities without (significant) income streams e.g. commuter groups, school governors and health watchdogs can scrutinise and campaign from a position of knowledge.

    Companies can create income streams by adding value to the open data releases for conveniance, immediacy or additional insights. By creating apps or web sites funded by advertising or subscription new business services can be created. As many of the available business models have low margins on low cost bases, they only work for free or marginal cost releases. This is because the open data can only be resold at very small markups as in an app… or because the open data enables a business to acquire users who don’t expect to pay for use of the information. However, once any digital services business has users there are multiple opportunities to monetise, e.g. by selling aggregate anonymised behaviour to other businesses trying to reach these audiences.

    4. Open data regulation

    Once open data releases have been made the government needs to regulate to ensure that

    • access follows public money
    • government itself does not compete with open data users
    • there is equality of access (net neutrality for data)
    • there is stability of distribution
    • the distributing agencies do not act like monopolies usually act (to hinder competitors & market entrants)

    These principles are needed to ensure that investments can be made in the private sector with confidence in the medium term.

    Placr’s business model for open data

    Given the availability of free or low cost data, Placr’s business model is to aggregate open data in transport, crime, schools and health and add value to data distributed through our API so it can be monetised in four ways:

    a) By ad-funded web distribution… Placr web pages with live departure data are now in the top 5 Google hits for searches like “Baker Street departures”. We are in the process of adding advertising to these pages now they have earned a high search ranking through simple, effective presentation. We are also creating social media feeds for all the bus stops and stations in the country on our new pla.cr transport shortcode service, so that comments and experiences can be shared by users and developers (see placr.mobi for the evolving web app prototype).

    b) By revenue-sharing with app developers who innovate with our API feeds at transportAPI.com. We have around 10 developers registered to use our feeds and the number is growing rapidly. We are cutting the time to market for developers and reducing the complexity of the raw data. We use a freemium style ‘click-wrap’ licence so that people can develop business models using our feeds and then pay us a flat 20% when they begin to take revenue. We are now starting to take revenue under this model.

    c) By serving the operators and businesses in the market with analytics. We are developing tools to:

    • Provide service performance metrics in real time and over the service history
    • feed our pla.cr stop/station social media feeds with service updates, and
    • mine the conversations for communities/ behaviours that allow the operators insight into their customers

    This a B2B service that is dependent on the audiences that use the B2C services… and this audience is built by presenting open data to them for free.

    d) By running consultancy and training for public and private sector bodies around open data opportunities, such as our Open Data Release Kit, an open consultancy and seminar offer for organisations looking to see a demo of API and web technology with their own datasets.

    The margins on a) and b) are still small… but they enable c) and d). It takes time to erect these ecosystems… and some patience is needed to build services and acquire funding. There are also huge risks for startups in innovating close to a government that has not stabilised policy. We are succeeding with the help of bodies like the TSB (see our LaunchPad application here [not funded, alas]). But we now need decisive action by government to challenge public sector data monopolies so that other organisations have the opportunity to leverage the value in the data.

    One of the chief barriers to progress is that we still don’t have free access to the crown jewels of government data, notably rail departures, postcodes/addresses, court judgments and planning applications. Access to these datasets are being restricted by:

    • monopoly self-interest in some agencies eg Royal Mail Address Management Unit for postcodes
    • IPR contamination by systems integrators handling state-funded data eg rail departures from ATOC, which have the effect or re-copyrighting the data
    • pre-existing cost-recovery arrangements that need reform eg Registry Trust handling court judgments on credit records needs harmonisation with other crime data releases

    These issues are in the public sector ‘too difficult’ box or will take time to resolve eg by action when licences roll over. But if we want the best open data in the world… and to create companies that can grow to dominate international markets… then the domestic market has to be liberalised across the board, not just where it is easy.

    Jonathan Raper

    UPDATE: in response to a sceptical comment about the prospects for open data by Steven Feldman, I posted some further arguments on the open data market as a comment on his blog here.

    May 11, 2011

    Why train departure information is not currently open data

    Filed under: open data — Jonathan Raper @ 11:26 pm

    Access to train departure information

    Users of the London DataStore will have noticed that there is no train departure information available as open data, and the issue generates a lot of questions from developers. This blog post attempts to explain why this is so and to let you know what the London DataStore has being doing about this in the meantime.

    The short answer to the question is that the Association of Train Operating Companies (ATOC), the only current player with a public-facing train departure information service and API, is a private sector body that does not release open data. ATOC’s National Rail Enquiries offers an API to its Live Departure Boards on a commercial basis and this is available to developers who meet their criteria and who are willing to pay for access. ATOC have granted some free licences to those not making any revenue from their service e.g. LiveTrains, so it may be worth applying to them if you fall into this category. There are also lots of commercial apps licensing data from ATOC, for example, myTrains for iPhone.

    However, given that the UK taxpayer subsidises the rail industry to the tune of £5bn a year, there are many open data campaigners who would like get access to a genuinely free source of train departure information. If this applies to you, and you’d like to read the whole story, make a cup of tea, draw up a comfy chair, and read on for the full story.

    Train departure information, simplified

    Information on train departures in Britain is largely generated by Network Rail (the rail infrastructure owner) from a variety of signalling apparatus and train reporting services. This train departure information is then currently exclusively sub-licenced to ATOC for their National Rail Enquiries (NRE) service on phone and web for users outside the rail industry. NRE charges app developers or website owners for use of this data from Network Rail and imposes its own licence conditions under a code of practice approved by the Office of Rail Regulation (ORR). The net effect of this regime is that developers have to charge relatively high fees for apps using this data, up to £5 per app in the cases of UK Train times. The NRE licence conditions also prohibit developers from being critical of the rail industry or having an adverse effect on TOCs. Paragraph 2 of the licence says “Applications which in NRE’s reasonable opinion are of demonstrable benefit to passengers will be granted unless outweighed by a material adverse impact on TOCs (whether financially, strategically, operationally or in regards to their reputation or the reputation of the industry as a whole).” (Para 2).

    Although there are regulators like the ORR and Passenger Focus that do valuable work, currently these bodies see the rail industry as solving the passenger information problem within the industry. Recently the ORR published a Passenger Information Consultation
    (PDF)
    that is open until 20th June for those that would like to put their own views across.

    Many developers would like the opportunity to innovate with train departure data to create apps and web sites that provide alternative views on this data, for example, local public transport aggregation sites, novel visualisations, delay monitors or cheapest fare finders. Developers believe that these kind of services are a powerful voice for the consumer. So why can’t developers produce these free apps? It’s all down to the post-privatisation structure of the rail industry.

    Who owns information about rail services in Britain?

    The taxpayer does not own this information directly despite the public funding. Network Rail (not ‘National Rail’… that is an ATOC brand for train services) is a private company limited by guarantee that carries out publicly regulated tasks i.e. running the railway. It is a private company that can make profits… but since taxpayers are providing much of the income and also the financial guarantee, there is some public confusion about whether Network Rail belongs in the Public sector. So, for example, the National Audit Office thinks it is a public body. The Information Commissioner has ruled that Network Rail is covered by the Environmental Information Regulations for public bodies (PDF) as it has a public task. Network Rail is also licenced and regulated by a statutory body, the Office for Rail Regulation. And Network Rail gets much of its money from two public sources 1. the ‘Network grant’ (around £5bn annually from the taxpayer) and from 2. Train Operating Companies who get public subsidies of £450m per year (PDF)) (see the Network Rail company report and accounts, note 3). So, Network Rail is a private body carrying out a regulated public task. It could suddenly become a public body if it defaulted on its debts as its predecessor ‘Railtrack’ did in 2002).

    As a private body Network Rail does not have to follow the rules of the public sector regarding transparency in data, despite its public task. The government does not exercise its right to appoint a Director of Network Rail. Network Rail does not have to answer Freedom of Information requests (PDF). It is being left out of the Protection of Freedoms Bill and the Public Data Corporation in legislation this year. Public input to Network Rail is via the 100 ‘Members’ drawn from the public and the rail industry who act as stakeholders in holding the Board of Directors to account, rather like governors at a school. Although the appointment of members is, in principle, independent, the Board of Network Rail “will not, in particular, appoint individuals whom it feels wish to pursue concerns or objectives which are inconsistent with the overall purpose of the company.” Ultimately, therefore, Network Rail is able to follow its commercial interests when deciding how to license train departure information collected with public support. Note that Network Rail is required by its licence from ORR to improve reliability and efficiency, but not transparency.

    How the public becomes private

    At present Network Rail exclusively licences train departure information outside the rail industry to the Association of Train Operating Companies (ATOC) for their National Train Enquiries (NRE) service based around a database called Darwin. NRE integrate several sources of data from Network Rail and some from train operating companies to produce the Live Departure Boards web site and data feeds for apps (developers can read more below on how this is done). This is a non-trivial task, but not impossible for others to replicate, as for example, Rockshore do for Network Rail. ATOC make charges for access to these data feeds and limit how they can be used with specific licence conditions. So, in effect, information that has been substantially funded by the public is now mixed with intellectual property from a fully private company ATOC, a situation that bodies such as the Open Rights Group have been concerned about in the public sector as a whole.

    Going back in history, until February 2009 ATOC licensed train departure information under commercial terms to a very small number of organisations, mostly within the rail industry. Kizoom published the only smartphone app at that time, the free MyRailLite for iPhone. Then a dispute arose between ATOC and Kizoom, and ATOC withdrew Kizoom’s licence to use the train departure information. Kizoom complained to the ORR, who conducted an investigation (PDF) into whether ATOC had abused a dominant position under competition law. ORR decided that ATOC did have a dominant position in the supply of train departure information, but they “found no evidence that ATOC’s conduct in granting access to Darwin had prevented a new product from coming to market or hampered the emergence of new technology” in November 2009. When the free MyRailLite from Kizoom was taken off the market, it was immediately replaced by a £5 iPhone app from Agant which was marketed under the National Rail Enquiries brand.

    After the Kizoom case, to regulate the dominant position of ATOC in this respect, the ORR asked ATOC to produce a Code of Practice on data licensing, and this was introduced in April 2010. Despite this Code, disputes over the licensing of train departure information are still occurring as independent developer Alex Hewson recently found out. He asked for a free licence and then published the text of the refusal, to find himself banned from getting a licence even if he paid, as he was publicly critical of ATOC and NRE, and they deemed this a prima facie breach of the Code.

    The current situation… and where next?

    What we see in the current situation is public funding going into the creation of train departure information at the level of Network Rail infrastructure and in the public subsidies to the Train Operating Companies. However, as two private bodies have responsibility for the public task of running the railways and communicating to passengers, this information is encumbered by private intellectual property rights. However, it would serve the transparency and accountability agenda if the raw feeds could be released as open data. The way forward might be to open access to Network Rail’s TDNet through their External Services Gateway (ESG), allowing independent developers and system integrators to add value to raw data and make apps to communicate with the rail traveller (see developer section below). This would allow developers the choice of paying ATOC for access to NRE or accessing raw data via TDNet and building their own services.

    This issue needs urgent attention at the time that the Public Data Corporation is being designed and ORR are consulting on a Passenger Information Consultation. Train departure information is key national dataset and a way needs to be found to make it available to developers to produce apps for the public and accountability for the regulators. If we need an example of good practice to motivate this, we should look at the example of the non-profit national bus departures aggregator Traveline who have announced that after a small connection fee, access to their national Nextbuses API is now free for developers to use to create free apps (like UK TravelOptions) up to a negotiated hit limit. This has created a situation where train departures are charged for and bus departures are free, even though both industries have a public/ private structure.

    In summary then, developers looking for access to open data on train departures must put their faith in Network Rail and the Office for Rail Regulation to enlarge the scope their vision for passenger information dissemination. Users of the London DataStore could play an important role in publishing passenger information, and we would like to see existing channels by which the rail industry publishes data internally (e.g. TD.Net through ESG) be opened up to external developers.

    Even more details… for developers

    Network Rail and ATOC have both developed sophisticated back office systems to handle train departure information. Comprehensive details are given in a report entitled ‘Integrated Passenger Information: Delivering the Rail “End to End” Journey’ by Aecom commissioned by Department for Transport Rail Group. The Stage 3 technical Annex (PDF) is 80 pages of detail about internal Network Rail systems for the serious geek.

    In summary, train departure information from timetables (Train Service Database – TSDB) and train describer information are aggregated into an internal Network Rail system called Control Centre of the Future (CCF). Note that some sections of line do not have train describers and need lower level systems to provide train locations, so there is not complete uniformity across the network. Train delays are recorded into a system called TRUST to ensure that train or freight operating companies (TOCs/FOCs) get charged if they are the cause of any delays. Data from CCF and TRUST are available to users in the rail industry through the Network Rail messaging service known as TD.Net via Network Rail’s External Services Gateway (ESG). Train Operating Companies have their own Customer Information Systems that integrate data from their operations, notably a messaging service on day-to-day operations (e.g. cancellations) called Tyrell and information on the formation of trains from a rolling stock system called GEMINI. Any developer building train departure services would need to look at what they could build just with access to TD.net. It remains to be seen what information on cancellations and train formation TOC’s might make available to independent developers.

    Network Rail is now investing in some new systems to improve this complex set of legacy technologies including GSM (R) for communication with trains and GPS for train positioning which will be integrated into its Intelligent Traffic Management (ITM) strategy. As the AECOM report points out “ITM could provide train location information to a far greater granularity, improving the accuracy of train running information for passenger information systems. This information could then be accessed by 3rd party systems.” (p13). As passengers already use smartphones with location services to report delays through crowd-sourcing services like @UKtrains using Twitter, in future passenger information from the industry will have to meet higher specifications to meet passenger expectations. This is a further driver to add to the expectations of openness for taxpayer funded public tasks.

    February 17, 2011

    BeingOpen and Levels of OpenStreetMap Use

    Filed under: open data,UK TravelOptions — harry.wood @ 6:09 pm

    Yesterday I took an afternoon off from normal placr work to attend BeingOpen, a conference on open technologies.

    Both Paul Clark and Chris Thorpe gave excellent talks on how the UK open data movement has a long way to go. Expenses data has made some headlines, but as Chris put it “I’m bored with transparency data and armchair auditing“. They mentioned transport data and “infrastructural” (geolocated assets) data as an example of an area where the open data movement can be more transformative.

    I gave a talk about my big passion, OpenStreetMap.

    In the example use cases I sneaked in a screenshot of placr’s UK TravelOptions iPhone app. Our partners, faster imaging, have produced one of the best client rendering engines for OpenStreetMap that I have seen, but actually a static screenshot doesn’t do it justice. You need to experience the fluidity of map view manipulating using gestures within the app itself (Try it. It’s free!)

    I tried to talk in broad terms about business aspects of OpenStreetMap. More of these are described on my blog, and all the slides are on slideshare, but here I thought I would highlight one particular slide which illustrates different levels of OpenStreetMap usage and involvement which companies or individual developers should consider:

    First of all (Level 1) the basic mashup approach, using the rendered map tiles on your website. Dead easy. Please do it! You can get pretty advanced with the data mashing in javascript on top of that.

    For the more adventurous, you’ll be wanting to download the raw data (Level 2) perhaps because you’ve thought of a service to build on it. You would do this either through the OpenStreetMap API, or (more likely) as a planet download. The data is improved all the time by the community though, so think about pulling in regular updates (Level 3) using diff downloads

    If you have a business which involves geo-located data, maybe there’s some data you could put in to OpenStreetMap (Level 4) There are many pitfalls, but done well, sharing in this way could help your business, and you may even benefit from a virtuous circle as the OSM community improves the accuracy of your data.

    Finally communicate with the OpenStreetMap community and work with them. The community is OpenStreetMap’s great strength, however communication can be quite tricky because it’s a rowdy chaotic loose-knit group. Lots of busy contact channels. Ask any question and you’ll get ten different answers. The Q&A site help.openstreetmap.org is a useful new channel which may yield a sane answer, but placr (and myself in particular) are happy to help with OpenStreetMap business ideas, so feel free to just contact us for a discussion

    Harry Wood

    February 1, 2011

    Five reasons to be cautious about street level crime data

    Filed under: Comment — Jonathan Raper @ 6:23 pm

    Lots of ‘street level crime’ #opendata released today on Police.uk. This ought to be another great moment for the #opendata movement, and in one sense it is. The government has stuck to its promise to release this data, and it has forced the police to produce it on time. It is good that politicians now see that opening data will promote a debate and enable citizens to discuss the issues with the professionals. This is the real promise of #opendata: it helps empower people by promoting more active questioning of the issues.

    However, this is another example of ‘ugly, early’ and we must look at the data very carefully to see what it is currently good for. Here are 5 reasons to be cautious about the insights it reveals at this stage.

    • The locations used for the map points are somewhat suspect as these quotes from the site indicate. “The location of incidents shown is approximated and indicative only. This is to protect the anonymity of individuals.” “Incidents of crime or anti-social behaviour are mapped to an anonymous point on or near the street where it happened.” What does this mean? If the police shift points around in urban areas then they just move the crime to places that may have had no crime. Press reports eg Guardian top 100 crime streets already suggest that some of these locations are actually surrogates for the real problem nearby. Incidentally, there is no guarantee that they are going to aggregate and report crime to the same points each month, so we won’t be able to compare through time.
    • For privacy reasons data is not shown for “… streets with fewer than 12 postal addresses”. What happens to these crimes… are they shifted next door or do they vanish? This is an arbitrary number to comply with advice from the Information Commissioners’ Office on privacy. But ICO’s advice still applies even if we can get the data from other sources eg Court Service, so these public censorship measures to protect privacy are in reality a sort of ‘moral panic’ about detailing a truth most people already know from other sources.
    • Some data is redacted eg sexual offences, murder. The Metropolitan Police has already released this data to ward level though… and it is easy to cross-reference one murder in one ward to reports in the local press at the same time:
      7:55am Monday 13th December 2010
      Richard Davies Jones, a solicitor from Woodfield Lane, Lower Ashtead, was charged in the early hours of this morning with the murder of 31-year-old Laura Grace Emily Davies Jones, a social worker from the same address. Its a simple job to match this name in the Electoral roll. Local aggregators already gather and publish this eg Belocal. Just put into your postcode and Twitter name and you get a semi-real time feed of local news including crime reports.
    • The data covers reported crimes not convicted criminals ie some of this activity turned out not to be crime. In some places people may be more or less disposed to report crime.
    • There is nothing here on the burden of policing in each area, so in many cases two areas with the ‘same crime levels’ side by side will have radically different experiences of crime and policing.

    So. It is great to see the data released, but this is only the first step. We need to support the police to better locate the crime and we need better visualisation of the data to set it in the context of demographics, policing and local geography. To contribute to this process Placr has developed a multiresolution crime browser for the data. In these maps you always get an overview of the patterns geographically and of each type of crime within the total until you zoom to the full point level detail. It gives a wonderful overview of the crime patterns within cities, exposing each neighbourhood’s experience of crime. Matching this with ‘burden of policing’ data would allow us to see if police time is being spent where the crime reports are.

    There are now a bunch of research questions including where is there ‘excess’ crime given the density of population or poverty/wealth indicators? If we build on this release and enhance the data, perhaps the police can build a new relationship with the communities they serve through a new dialogue, and that would be a big win.

    Jonathan

    Older Posts »

    Powered by WordPress