It is loopy how a lot Transport for London can study us from our cellular information
Hey, pay attention! Be sure to head over to What’s Happening Now, the place you’ll be able to hear a particular podcast interview I recorded with Richard Sambrook, former director of BBC Information and the World Service. We mentioned how tales are prioritised, the right way to navigate tough editorial selections round stability and equity, and the way forward for the BBC. In case you’re in any respect fascinated by media stuff (I do know you might be), then I believe you’re going to get pleasure from it. Listen here (or seek for What’s Occurring Now in your podcast app of alternative).
It’s a well-known maxim within the tech business that if you happen to’re not paying for the product, then you are the product. We get to make use of unbelievable providers like Gmail, Fb and Twitter
without spending a dime – and in return, the large tech corporations promote entry to our eyeballs to advertisers.
However this isn’t all the time the case. Typically, even once we pay for a service, we’re additionally the product being bought.
For instance, one thing that EE, O2 and Vodafone all do, however don’t actually love to shout about is promote anonymised, aggregated information on our bodily actions to native authorities, transit businesses and some other corporations with a chequebook giant sufficient.
And that’s why at this time I’m going to inform you about a number of the actually mad issues that Transport for London (TfL) can work out about us by utilizing our location information, offered by the O2 cellular community.
Utilizing the Freedom of Info Act, I’ve managed to acquire the Information Safety Influence Evaluation, and the Assertion of Work for TfL’s Undertaking EDMOND – which stands for “Estimating Demand from Cellular Community Information”
.
That’s proper, this week’s publication is dangerously near being precise reporting as a substitute of simply my normal bloviating. And having now fallen down the rabbit-hole digging into it, I’m amazed by the standard of data it provides transport planners and coverage makers. And actually, I’m slightly freaked out.
So let’s dive in and discover it collectively.
In case you get pleasure from blazing scorching takes on politics, coverage, and – with alarmingly regularity – transport, then please take into account subscribing for FREE to get extra of This Kind Of Factor in your inbox. And if you happen to actually worth my work, please take into account upgrading to a paid subscription to get Even Extra Good Stuff regularly. (Extra paid subscriptions implies that I can spend extra time on the free stuff, too!)
The way in which EDMOND works may be very intelligent. TfL isn’t truly monitoring all of our telephones the entire time, presumably as a result of it is aware of that to take action could be vastly controversial.
So as a substitute, it contracts with O2
to license information over shorter intervals of time. For instance, in 2023, it took information from ‘as much as’ 40 regular weekdays between the beginning of April and finish of June, when nothing bizarre was taking place like college holidays or financial institution holidays.
This is a gigantic dataset, with doubtlessly as much as 25 million telephones included in it
, but it surely nonetheless doesn’t embrace everybody in London as a result of some folks use different networks like EE, Vodafone, and so forth.
So it’s essential to grasp that EDMOND isn’t only a pile of knowledge – it’s a mannequin, the place TfL has taken the info from O2, and has completed some intelligent maths to scale it as much as estimate the the actions of everybody in London over the age of 12.
There’s additionally the elephant within the room. Although it could be shocking to study that O2 is promoting information insights on its customers, it’s not promoting private information
. What’s being bought by O2 and licensed by TfL is aggregated, anonymised information.
This implies TfL can’t see the actions of particular person folks, and naturally every little thing is absolutely GDPR-compliant and above board – as you’d anticipate for a significant company and a transport company.
In actual fact, in response to the 2018 Travel in London report, any time the info suggests there have been fewer than ten telephones in a given statistical space, the info was robotically excluded so to keep away from inadvertently unmasking folks primarily based on their metadata.
So to be completely clear, there’s no massive scandal right here
. In actual fact, utilizing this kind of information is more and more routine for native authorities and others. To the extent that O2 even has a model title for this line of its enterprise – “O2 Movement”.
However that doesn’t imply what’s taking place isn’t fascinating. In actual fact, I’m prepared to wager that most individuals exterior of the cellular business are fully unaware their motion information is getting used on this method.
Now let’s get to the good things. What does all of this information do for TfL, and what information have they got to play with?
Due to the aforementioned privateness restrictions, they don’t merely get dots on the map present them the place everybody was. As an alternative, the info is damaged down into a whole bunch of “Medium Tremendous Output Areas (MSOAs)” – it is a statistical customary that divides up the nation into teams of between 2000 and 6000 properties.
Right here’s a map displaying London’s MSOAs:
this, you’ll be able to see why information on this degree could be helpful.
Utilizing the aggregated information from O2, TfL can see which areas of London persons are travelling from and the place they’re travelling to – which is strictly the kind of info you may want if you happen to had been, for instance, planning the place to run buses or impose an Ultra-Low Emissions Zone that disincentivises automotive use.
It goes deeper. As you’ll be able to see above, it’s potential to work out which elements of London are internet hosting essentially the most worldwide guests, by which MSOAs have essentially the most telephones utilizing worldwide roaming mode inside their boundaries. (Unsurprisingly it seems the busiest areas for worldwide guests are the West Finish, and Heathrow.)
However right here’s the opposite loopy factor. Whether or not your SIM card is roaming just isn’t the one factor that O2 is aware of about its customers. In actual fact, as a result of it has demographic information on its contract prospects, it’s potential to interrupt down the demographics of individuals in every MSOA by gender and age – in addition to the time of the day they had been there.
Right here’s some made-up instance information displaying simply that, from one of many paperwork I acquired:
Arguably the creepiest column above is the one you’ll be able to see labelled “kind” – which you’ll be able to see labels several types of folks “Resident”, “Employee” and “Customer”. As a result of O2 doesn’t simply know the place you might be, it is aware of why you’re there too.
How does it do that? By making some good assumptions.
For instance, it determines your private home by trying on the place the place you spend most evenings and nights through the prior month. It additionally figures out the place you’re employed primarily based on the place you spend working hours throughout weekdays. And in response to the paperwork I’ve obtained, it seems that the newest 2023 modelling may also be determining when persons are particularly travelling to training establishments (ie: faculties and universities) too.
So TfL isn’t simply ready to determine the place persons are travelling to and from, however why they’re travelling. However amazingly, the mannequin will get even smarter than this.
To my thoughts, essentially the most spectacular factor concerning the EDMOND mannequin is that it might apparently precisely predict by what means you’re travelling – whether or not by foot, bike, automotive, prepare, bus and even lorry.
This can be a actually onerous query from a technical perspective. The apparent method to do that could be for O2 to have a look at the pace your dot on the map is shifting and the route on which it’s travelling. That method, it might conceivably match you as much as recognized bus routes or railway strains. In actual fact, that is how railway travellers are recognized – they have a look at the place clusters of customers seem like shifting collectively, as indicative of teams of individuals inside a prepare carriage.
However on London’s busy streets, visitors will typically crawl to strolling tempo. And moreover, how can they work out if you happen to’re in a automotive, taxi or bus? And even on a motorcycle? How can it inform the distinction?
I’ve seen different units try and determine this out earlier than. My Apple Watch will ask me if I’m biking when it detects that I’m shifting at cycle-like speeds and my coronary heart price is elevated. And Google Maps will typically ask me to price my bus journey, if it detects that I’ve simply checked for when the following bus is coming, and have then travelled alongside the route of that bus.
However each of this stuff require entry to both the contents of my telephone or a monitor for my coronary heart price. Neither of which O2 has entry to.
So as a substitute, the EDMOND mannequin goes into its thoughts palace and causes a solution utilizing pure logic, by leaning on TfL’s Public Transport Accessibility Stage (PTAL) information.
PTAL is a map of London, divided into 1000’s of particular person squares, every of which has been given a rating for a way accessible public transport is. Scores vary from zero (the bits of countryside which might be nonetheless technically in Better London) to 6B – think about you’re standing simply exterior Kings Cross station with a half a dozen tube strains, nationwide rail and numerous buses to select from. You possibly can see the PTAL rating for the place you reside on this interactive tool here.
So how does TfL work out how you’re travelling? As a result of it is aware of the place you’re travelling and the route you’re taking, EDMOND seems on the PTAL scores for the totally different places you hit, and mainly makes a prediction on the means by which you’re travelling, primarily based on what transport choices had been accessible to you.
And right here the loopy factor: TfL is aware of this truly works, because it has validated these predictions in opposition to information collected by its extra conventional London Journey Demand Survey, which is posted out to households and stuffed in by folks with clipboards. And the info all strains up.
In order you’ll be able to inform, I believe that is fairly superb. From some aggregated dots on the map, the time of day and a few demographic information, TfL has constructed up a very detailed image of how persons are shifting round London.
And EDMOND information is, unsurprisingly, extensively used internally. It is likely one of the key instruments getting used to forecast cross-river traffic for the controversial Silvertown Tunnel, which is at the moment underneath development in East London. And it has additionally been instrumental in forecasting traffic patterns when rolling out the even more controversial ULEZ scheme.
Extra usually, it additionally feeds information into TfL’s different, even larger strategic fashions, like MoTiON, which includes cellular information, Oyster card faucets, real-time bus, Boris Bike hires and even information from the biking app Strava to mannequin how London travels. Which I assume is what it takes if you wish to preserve a metropolis of 9 million folks shifting
.
Nonetheless, regardless of the apparent utility of EDMOND, I’m undecided how I really feel about it.
Why? As a result of the privateness implications bizarre me out a bit… And I do know I’m not the one one.
I’m not likely a scoop-getting journalist. I’m extra of a “I spoke to some consultants and so they agree it’s difficult” man. However again in 2017, I broke the story of what TfL had learned from tracking phones on the tube network using wifi
.
The story is much like this one: That by utilizing wifi pings from our telephones, TfL can plot how we’re shifting across the Tube community, even when we haven’t related to the Tube Wifi community. (I like to recommend clicking the hyperlink for some superior diagrams).
One doc I obtained for that story additionally contained the outcomes of a focus-group examine performed for TfL, mainly testing the attitudes of the general public to several types of cellular information monitoring. The outcomes of which you’ll be able to see on this matrix:
As you’ll be able to see above, wifi monitoring was acquired fairly nicely. Individuals understood the aim for doing so (it helps TfL handle the Tube community, and may work out how crowded trains are – information which might at this time be seen by passengers in transport apps). And the info assortment was perceived as comparatively clear, presumably as a result of monitoring may be sign-posted inside stations, and that it’s potential to opt-out by switching the wifi in your telephone off.
In contrast, taking information direct from the cellular networks, as seen within the backside left of the diagram, was extra poorly acquired – with folks not sure why it was being completed, or how they might profit.
I believe this response from the general public is fairly comprehensible. It’s actually the case that the key telephone networks are recording our actions and build up a surprisingly detailed image of the locations we sleep, work and hang around.
However to be honest TfL does seem to have approached utilizing the info in essentially the most minimising and proportionate method potential – by taking a small pattern of anonymised information, and utilizing it to construct a mannequin, as a substitute of establishing a system that collects information around-the-clock, in actual time
. So I can’t actually fault TfL for utilizing the info. The truth that O2 is prepared to promote these anonymised information insights should be irresistible if you happen to’re managing a transport system, particularly in a metropolis as difficult as London.
So what I’m maybe extra sceptical of is the truth that this information is being collected and bought by the telephone networks within the first place.
As a result of assume for a second about what they should acquire to make it work. It implies that someplace inside my telephone community’s databases are my actions for a minimum of the final month – revealing in every single place I’ve been, who was in the identical place as me on the identical time, and simply what number of instances I’ve shamefully stopped in at Burger King on the best way house, as a result of I can’t be bothered to cook dinner something at house.
Although TfL and the networks’ prospects solely get the info in an anonymised, aggregated type, that’s not true of the networks, who have to retailer and analyse my precise information to make it helpful
.
Maybe, arguably, it is a good factor. You possibly can think about how, for instance, it might make it simpler for the authorities to seek out a terrorist on the run. Barely additional down the slippery slope, you’ll be able to think about how this identical tech is also utilized by the safety providers to search out everybody who attended the Hizb ut-Tahrir protest over the weekend, and inside seconds inform them not simply the place attendees stay, however the place they’re proper now
.
Whether or not that will be proportionate or not, I’m undecided. However what’s clearly true is that at a sure level, the civil liberties arguments turn into necessary.
One helpful gut-check of any new know-how is to think about the way it could be utilized by a foul actor. You don’t need to assume too onerous to think about how such monitoring know-how may very well be misused by an authoritarian in the event that they had been to in some way come to energy
. There’s undoubtedly potential for what Edward Snowden calls “turnkey tyranny”.
The query for coverage makers and voters then, is how will we commerce off between the utility of knowledge analytics just like the networks are promoting, and the potential dangers that we’re constructing instruments that might conceivably, if issues go badly unsuitable, be used because the instruments of oppression?
And as an unsatisfying ending to this piece, I’m undecided within the particular case of EDMOND the place the stability lies.
I believe it’s undoubtedly a robust case for why we should always do our greatest to build strong institutions to guard our democracy, in order that we are able to extra safely benefit from new know-how, and construct a extra purposeful society, with none of our rights feeling threatened.
However even when we resolve that the networks carefully monitoring us is okay as a result of it’s helpful, we must be cautious to take care of a wholesome scepticism too. And that begins from truly understanding it’s a factor that occurs. So a minimum of now you’ve reached the top of this text, you and hopefully just a few extra different folks will learn about it. Now if solely EDMOND might additionally work out what we should always do with this info too.
Phew! That was a protracted one this week. However if you happen to loved studying it then be sure to subscribe – for FREE (or if you happen to’re good, paid), to get extra articles direct to your inbox. These articles take a number of days every to supply, so that they solely occur together with your help.
Follow me on Twitter (@Psythor)
As you’re fascinated by transport, you may like this one about the surprising good things in Rishi’s Plan for Drivers. Or possibly this one about extending the Elizabeth Line into Kent. Or possibly even this one arguing that ‘misinformation’ isn’t the problem with the EV transition.