Collecting telemetry data about users and their actions on the web and applications is a fee for using “free” services on the Internet. Users pay with their attention and time spent in these services and applications. Therefore, mobile platform manufacturers, IT companies, social networks are interested in obtaining the maximum amount of data from devices.
If individually telemetry data does not pose a threat to a specific person, then their combination can provide a lot of information about a person , his interests, family, work. At the same time, the constant connection of devices with the infrastructure of IT giants creates a potential threat of leakage of confidential information or studying the environment using microphones and cameras of smartphones.
This article discusses what specific information a smartphone transmits to vendors’ servers.
Initial conditions and tools
Samsung SM-A505FN smartphone (trade name Samsung Galaxy A50) with firmware reset to factory settings and an installed Russian software package, without synchronized accounts.
At the time of traffic collection, the smartphone was not used (except for launching the camera, calendar, messages, contacts, and calculator). Most requests were made in the background.
BurpSuite and Wireshark were used to investigate traffic, and when working with base64, gzip and xxd archives.
Resource traffic analysis related to Google
First, DNS queries were analyzed.
Here you can see which domains the device is querying and DNS server responses with IP addresses for each requested domain respectively:
- time.google.com
- mtalk.google.com
- www.google .com
- youtubei.google.com
- www.googleapis.com
- android.clients. google.com
- play-lh.googleapis.com
- play-lh.googleusercontent.com
It is clear that this is not all with whom the smartphone wants to communicate and share data, but this article focuses on *googleand a little on *samsungwith *yandex*.
I would like to note right away that even though a smartphone requests so many domains, it does not exchange specific data with all of them. Some services are needed to check the Internet connection or to synchronize the time.
This is how the smartphone checks the Internet connection:
Or synchronizes time with time.android.com or *.pool.ntp.org.
The first *googledomain in BurpSuite’s request history you can see crashlyticsreports-pa.googleapis.com.
Some random unreadable characters are visible in the body of the POST request. If you look more closely at the request, you can see the gzip format in the Accept-Encoding header, which means that this is an archive and there is a high probability that something is in it.
To get the data from the archive carefully and not damage them , the best solution would be to select everything from the eleventh line to the end and encode it in base64.
You can notice at the beginning of the line a characteristic set of bytes for base64 encoded “.gz” archive. You can extract a JSON file from it.
The firebase service collected information about the device (model number, chipset installed on the device, model , name of the firmware), then information about errors in applications is already presented, and more specifically, the methods in which the failure occurred and the level of its criticality are listed.
The next “Google” query was this one:
Here, the query sends the application name, its (application’s) version, Android version and identifier with the “usage_tracking_enabled=0” parameter. Apparently this is an advertising identifier to show ads in youtube.
Here, the smartphone checks for the presence and necessity of updating the system WebView:
And here is another request with the archive only on play.googleapis.com.
To get the contents of the archive, you need to follow the same steps as above.
The data is already in a different format. Also, the device information comes first, and then among the ASCII characters, you can find a distinct mention of some applications.
In addition to the clear names of the packages, everything looks like a unique identifier in this, most likely these are the identifiers of other applications. After this request, there are two more, the same in meaning, but with different values.
And in this request, the smartphone already contains an advertising identifier:
The value of the rdid parameter is the advertising identifier assigned to the device.
Archives go to play.googleapis.com already known to us, only archives go to another endpoint (/log/batch), but no longer gzip, this time brotli was used for compression, which can be seen from the header Content-Type
Everything, as always, begins with basic information about the device, but the applications with versions and tokens appear further.
Information provided about another error
Here is information about the error in the method during operation applications
And information is also transmitted with some global values
All this sent com.google.android.gms package.
Research of traffic to Samsung and Yandex resources
When you start the standard camera application, the smartphone immediately sent a request to the Samsung server with the name of the application that was open. Apparently, the device decided to check for updates.
One of following requests, an archive with unknown content is sent
Data from the archive are more like a set of parameters.
But the Samsung Galaxy Store page loaded, although no one asked for it.
Soon a request to Yandex appeared in the traffic. In addition to a bunch of parameters, the request also contains an archive.
The request to the Yandex server does not indicate what is used for compression, and the first bytes of the message from the request body do not make it clear how the data was compressed, so it was not possible to look inside. But another request uses gzip for compression. The compressed data can be examined.
This archive contains a standard “header” and a large number of parameters. The parameters below in the figure are taken from the middle for an example.
Besides single services of implicit ownership, facebook and Microsoft were very active. They were among the first to “feel” the presence of the Internet and considered it their sacred duty to immediately notify all possible servers.
Given all of the above, we can say that smartphones have a busy life. Everything that was said in the article is almost a stock phone without synced accounts. At the time of the traffic collection, I did not use the phone (except for launching the camera, calendar, messages, contacts and calculator), it was just laying down and most of the requests were in the background. Geolocation was not included, I would venture to suggest that it would be no less interesting with it.