The rush for location databases

Date: Monday May 10, 2010

Coming back from the rather buzzy Location Business Summit Europe 2010 a few things were clear to me: operators haven’t yet written down their location strategies, LBA is a reality the agencies are now very interested in, and everybody thinks it’s a great idea to build their own location databases.

The most popular way to do this is not to send funny vehicles around every street in the world like Google and Skyhook did but to let your users do it for you.

So here comes the sniffer software, audacious little spies on your phones that not only tell you where you are but tell their masters where the Cell-ID and Wi-Fi networks are in relation to your GPS location.

It’s innocuous enough if battery and privacy are not important issues for you. I ran Google Map in London for a full 40 minutes before my battery gave up… enough time for Google to know exactly where I went shopping that day.

However, Google is only half the story this time. There is a real awareness from the different LBS players about the need to own the location provision. Operators want Wi-Fi databases, device manufacturers want everything (Cell-ID, Wi-Fi and more if available) and application developers want indoor location and maps.

The point here is to provide fast and cheap location to devices and applications not covered by the free APIs of Google and Apple. Access to these location databases will provide fallback to Wi-Fi positioning and then to Cell-ID triangulation when GPS is not available. This is not to be confused with Assisted GPS; these databases do not (yet) actually assist GPS in getting faster TTFF. This will however come in the next version of SUPL, the standard by which A-GPS data is exchanged.

That’s probably why I keep meeting interesting but rather new companies like Location-API that are providing sniffed databases to the biggest bidders such as Statsis, Deveryware or Combain.

Keeping the data current

Location-API has one of the biggest databases of Cell-IDs, claiming 6.78 million of them worldwide. This is a huge database, so how do they keep it updated?

Rikard Windh, co-founder, Combain Mobile AB, explains: “It keeps updated with a continuously significant flow of positions every day. For the calculation of the Cell-ID and Wi-Fi AP position we only use the 100 latest submitted positions. If Cell-ID changes or the Wi-Fi spot moves, it will be adjusted automatically after a while. ”

For Wi-Fi, Statsis in the UK is mapping access points in various markets where they have trials with operators and service providers. They focus on the indoor mapping segment.

Rob Palfreyman, CEO of Statsis, explains that their approach is to run the sniffer on the device so the database is updated off-line and fed back to the server only when convenient. This saves power and avoids roaming data charges. Also the database can be loaded on the device before travelling to enable off-board location and avoid data charges (the file will typically be less than 2MB).

Statsis is currently engaged in trials with a major MNO and two major silicon vendors. They also support the Symbian Foundation with contributions and have their own brand applications.

For clarification: unlike Android, Symbian will not provide free location data to the application developers but let the device manufacturers and the developers chose different available sources. It will be up to the device manufacturers to decide who pays for location data on Symbian ^3 and ^4.

The largest database is however Skyhook’s, with more than 200 million APs mapped.

Bootstrapping the crowdsourced data

I asked Ted Morgan, CEO of Skyhook, if all iPhones were systematically sniffing for Cell-IDs and Wi-Fi APs to update the Skyhook database.

“We can’t talk about specific devices of customers, but our core location engine does not background sniff or probe for cell or Wi-Fi data. We merely use the user-driven location requests to update and improve our data. It is only done when the user is requesting location, unlike how others do it quietly in the background several times a day. ”

So 10 million iPhones and iTouches in Europe improve the Skyhook data? How does that compare with the few thousand Skyhook wardrivers? And would it be correct to suggest Skyhook’s database is now mostly crowdsourced?

“No, we feel very strongly that a reliable location system requires both systematic field (drive) scanning and device updating. One or the other is not sufficient.

“Because crowdsourced data by definition follows the crowd, so it is dense in some areas and very spotty in others. Good positioning requires a balanced set of reference points on all sides.

“When you have them clumped on a road, you get poor results. Think of a highway with cell towers following the highway. As long as you are on the highway, you can use the cell towers fairly well for location, but once you get off the highway, those cell towers do a poor job of telling you where you are. Also you can’t build up the crowdsourced data unless the crowd has apps it wants to use and those apps need location in the first place. So the driving data helps bootstrap the crowdsourced data to a certain extent. And then going forward the driving is less frequent but still needed to level set the data.”

So how do sniffing technology and methodology compare?

I did ask Ted that very question, but the guys at Xconomy got there first (it’s not fair, they are both in Boston!). So here is the answer:

“There are a couple of different approaches to getting the signal data; one of them is active scanning, and the other is passive sniffing. Both techniques have their pros and cons, but when you are doing the passive sniffing you have to make sure you are not accessing private network messages. It’s not a hard thing to do; you just do not record those messages.”

Google surveys Wi-Fi networks for the same basic reason Skyhook does – to provide an additional way, beyond GPS and cell tower triangulation, for phones to determine their locations.

In active scanning, Wi-Fi surveyors driving down a public street send out probe requests that ask every Wi-Fi access point within range to respond. This happens very quickly. The downside is that if an in-range access point happens to be busy -say, helping its owner download email – it won’t respond to the probe request, so the surveyors will miss that network.

The way around that problem is to use passive sniffing, which picks up all of the traffic travelling over active Wi-Fi networks, including key identifiers such as SSIDs (network names) and MAC addresses (similar to serial numbers, these are unique to each Wi-Fi router). The downside of passive sniffing is that it’s slower than active scanning, since routers may be broadcasting on any of a dozen channels, and each must be sniffed individually. “And you have to make sure you do not capture any of the network messages,” says Morgan.

Which is just what happened.

Google’s sniffing blunder

Google disclosed last Friday that its Street View cars had mistakenly collected data about the websites users were visiting on open wireless Internet networks.

Alan Eustace, a senior executive in Google’s engineering and research department, apologised for the mistake in a blog post and said the company was working with regulators to dispose of the data.

He said the company had stopped its Street View cars, which are used to gather information for Google’s mapping service, from collecting Wi-Fi data entirely.

Eustace also stressed that the data was only collected from networks that were not password protected, and that it was never used “in any Google products.”

I think it’s fair to say the apologies didn’t go a long way to appease the growing mistrust in all things Google…

Navizon is the positioning technology provider that holds the biggest share of global Cell-IDs, mapped at 7 million, and the widest global coverage of Wi-Fi APs.

Cyril Houri, Navizon’s founder, quite aptly said out loud what everybody was thinking: “This story is totally astonishing. The data that Google was collecting in secret (the network activity) has nothing to do with the Wi-Fi information required for geolocation. And storing and maintaining the data cannot be done by accident over a period of four years since it requires massive storage space.”

More on location databases, Cell-ID and Wi-Fi positioning and the vehicle data market see our in-depth vehicle data and connected car market research reports