This chapter will explain Python digital forensics on mobile devices and the concepts involved.
Mobile device forensics is that branch of digital forensics which deals with the acquisition and analysis of mobile devices to recover digital evidences of investigative interest. This branch is different from computer forensics because mobile devices have an inbuilt communication system which is useful for providing useful information related to location.
Though the use of smartphones is increasing in digital forensics day-by-day, still it is considered to be non-standard due to its heterogeneity. On the other hand, computer hardware, such as hard disk, is considered to be standard and developed as a stable discipline too. In digital forensic industry, there is a lot of debate on the techniques used for non-standards devices, having transient evidences, such as smartphones.
Modern mobile devices possess lot of digital information in comparison with the older phones having only a call log or SMS messages. Thus, mobile devices can supply investigators with lots of insights about its user. Some artifacts that can be extracted from mobile devices are as mentioned below −
Messages − These are the useful artifacts which can reveal the state of mind of the owner and can even give some previous unknown information to the investigator.
Location History− The location history data is a useful artifact which can be used by investigators to validate about the particular location of a person.
Applications Installed − By accessing the kind of applications installed, investigator get some insight into the habits and thinking of the mobile user.
Smartphones have SQLite databases and PLIST files as the major sources of evidences. In this section we are going to process the sources of evidences in python.
A PLIST (Property List) is a flexible and convenient format for storing application data especially on iPhone devices. It uses the extension .plist. Such kind of files used to store information about bundles and applications. It can be in two formats: XML and binary. The following Python code will open and read PLIST file. Note that before proceeding into this, we must create our own Info.plist file.
First, install a third party library named biplist by the following command −
Pip install biplist
Now, import some useful libraries to process plist files −
import biplist import os import sys
Now, use the following command under main method can be used to read plist file into a variable −
def main(plist): try: data = biplist.readPlist(plist) except (biplist.InvalidPlistException,biplist.NotBinaryPlistException) as e: print("[-] Invalid PLIST file - unable to be opened by biplist") sys.exit(1)
Now, we can either read the data on the console or directly print it, from this variable.
SQLite serves as the primary data repository on mobile devices. SQLite an in-process library that implements a self-contained, server-less, zero-configuration, transactional SQL database engine. It is a database, which is zero-configured, you need not configure it in your system, unlike other databases.
If you are a novice or unfamiliar with SQLite databases, you can follow the link www.howcodex.com/sqlite/index.htm Additionally, you can follow the link www.howcodex.com/sqlite/sqlite_python.htm in case you want to get into detail of SQLite with Python.
During mobile forensics, we can interact with the sms.db file of a mobile device and can extract valuable information from message table. Python has a built in library named sqlite3 for connecting with SQLite database. You can import the same with the following command −
import sqlite3
Now, with the help of following command, we can connect with the database, say sms.db in case of mobile devices −
Conn = sqlite3.connect(‘sms.db’) C = conn.cursor()
Here, C is the cursor object with the help of which we can interact with the database.
Now, suppose if we want to execute a particular command, say to get the details from the abc table, it can be done with the help of following command −
c.execute(“Select * from abc”) c.close()
The result of the above command would be stored in the cursor object. Similarly we can use fetchall() method to dump the result into a variable we can manipulate.
We can use the following command to get column names data of message table in sms.db −
c.execute(“pragma table_info(message)”) table_data = c.fetchall() columns = [x[1] for x in table_data
Observe that here we are using SQLite PRAGMA command which is special command to be used to control various environmental variables and state flags within SQLite environment. In the above command, the fetchall() method returns a tuple of results. Each column’s name is stored in the first index of each tuple.
Now, with the help of following command we can query the table for all of its data and store it in the variable named data_msg −
c.execute(“Select * from message”) data_msg = c.fetchall()
The above command will store the data in the variable and further we can also write the above data in CSV file by using csv.writer() method.
iPhone mobile forensics can be performed on the backups made by iTunes. Forensic examiners rely on analyzing the iPhone logical backups acquired through iTunes. AFC (Apple file connection) protocol is used by iTunes to take the backup. Besides, the backup process does not modify anything on the iPhone except the escrow key records.
Now, the question arises that why it is important for a digital forensic expert to understand the techniques on iTunes backups? It is important in case we get access to the suspect’s computer instead of iPhone directly because when a computer is used to sync with iPhone, then most of the information on iPhone is likely to be backed up on the computer.
Whenever an Apple product is backed up to the computer, it is in sync with iTunes and there will be a specific folder with device’s unique ID. In the latest backup format, the files are stored in subfolders containing the first two hexadecimal characters of the file name. From these back up files, there are some files like info.plist which are useful along with the database named Manifest.db. The following table shows the backup locations, that vary with operating systems of iTunes backups −
OS | Backup Location |
---|---|
Win7 | C:\Users\[username]\AppData\Roaming\AppleComputer\MobileSync\Backup\ |
MAC OS X | ~/Library/Application Suport/MobileSync/Backup/ |
For processing the iTunes backup with Python, we need to first identify all the backups in backup location as per our operating system. Then we will iterate through each backup and read the database Manifest.db.
Now, with the help of following Python code we can do the same −
First, import the necessary libraries as follows −
from __future__ import print_function import argparse import logging import os from shutil import copyfile import sqlite3 import sys logger = logging.getLogger(__name__)
Now, provide two positional arguments namely INPUT_DIR and OUTPUT_DIR which is representing iTunes backup and desired output folder −
if __name__ == "__main__": parser.add_argument("INPUT_DIR",help = "Location of folder containing iOS backups, ""e.g. ~\Library\Application Support\MobileSync\Backup folder") parser.add_argument("OUTPUT_DIR", help = "Output Directory") parser.add_argument("-l", help = "Log file path",default = __file__[:-2] + "log") parser.add_argument("-v", help = "Increase verbosity",action = "store_true") args = parser.parse_args()
Now, setup the log as follows −
if args.v: logger.setLevel(logging.DEBUG) else: logger.setLevel(logging.INFO)
Now, setup the message format for this log as follows −
msg_fmt = logging.Formatter("%(asctime)-15s %(funcName)-13s""%(levelname)-8s %(message)s") strhndl = logging.StreamHandler(sys.stderr) strhndl.setFormatter(fmt = msg_fmt) fhndl = logging.FileHandler(args.l, mode = 'a') fhndl.setFormatter(fmt = msg_fmt) logger.addHandler(strhndl) logger.addHandler(fhndl) logger.info("Starting iBackup Visualizer") logger.debug("Supplied arguments: {}".format(" ".join(sys.argv[1:]))) logger.debug("System: " + sys.platform) logger.debug("Python Version: " + sys.version)
The following line of code will create necessary folders for the desired output directory by using os.makedirs() function −
if not os.path.exists(args.OUTPUT_DIR): os.makedirs(args.OUTPUT_DIR)
Now, pass the supplied input and output directories to the main() function as follows −
if os.path.exists(args.INPUT_DIR) and os.path.isdir(args.INPUT_DIR): main(args.INPUT_DIR, args.OUTPUT_DIR) else: logger.error("Supplied input directory does not exist or is not ""a directory") sys.exit(1)
Now, write main() function which will further call backup_summary() function to identify all the backups present in input folder −
def main(in_dir, out_dir): backups = backup_summary(in_dir) def backup_summary(in_dir): logger.info("Identifying all iOS backups in {}".format(in_dir)) root = os.listdir(in_dir) backups = {} for x in root: temp_dir = os.path.join(in_dir, x) if os.path.isdir(temp_dir) and len(x) == 40: num_files = 0 size = 0 for root, subdir, files in os.walk(temp_dir): num_files += len(files) size += sum(os.path.getsize(os.path.join(root, name)) for name in files) backups[x] = [temp_dir, num_files, size] return backups
Now, print the summary of each backup to the console as follows −
print("Backup Summary") print("=" * 20) if len(backups) > 0: for i, b in enumerate(backups): print("Backup No.: {} \n""Backup Dev. Name: {} \n""# Files: {} \n""Backup Size (Bytes): {}\n".format(i, b, backups[b][1], backups[b][2]))
Now, dump the contents of the Manifest.db file to the variable named db_items.
try: db_items = process_manifest(backups[b][0]) except IOError: logger.warn("Non-iOS 10 backup encountered or " "invalid backup. Continuing to next backup.") continue
Now, let us define a function that will take the directory path of the backup −
def process_manifest(backup): manifest = os.path.join(backup, "Manifest.db") if not os.path.exists(manifest): logger.error("Manifest DB not found in {}".format(manifest)) raise IOError
Now, using SQLite3 we will connect to the database by cursor named c −
c = conn.cursor() items = {} for row in c.execute("SELECT * from Files;"): items[row[0]] = [row[2], row[1], row[3]] return items create_files(in_dir, out_dir, b, db_items) print("=" * 20) else: logger.warning("No valid backups found. The input directory should be " "the parent-directory immediately above the SHA-1 hash " "iOS device backups") sys.exit(2)
Now, define the create_files() method as follows −
def create_files(in_dir, out_dir, b, db_items): msg = "Copying Files for backup {} to {}".format(b, os.path.join(out_dir, b)) logger.info(msg)
Now, iterate through each key in the db_items dictionary −
for x, key in enumerate(db_items): if db_items[key][0] is None or db_items[key][0] == "": continue else: dirpath = os.path.join(out_dir, b, os.path.dirname(db_items[key][0])) filepath = os.path.join(out_dir, b, db_items[key][0]) if not os.path.exists(dirpath): os.makedirs(dirpath) original_dir = b + "/" + key[0:2] + "/" + key path = os.path.join(in_dir, original_dir) if os.path.exists(filepath): filepath = filepath + "_{}".format(x)
Now, use shutil.copyfile() method to copy the backed-up file as follows −
try: copyfile(path, filepath) except IOError: logger.debug("File not found in backup: {}".format(path)) files_not_found += 1 if files_not_found > 0: logger.warning("{} files listed in the Manifest.db not" "found in backup".format(files_not_found)) copyfile(os.path.join(in_dir, b, "Info.plist"), os.path.join(out_dir, b, "Info.plist")) copyfile(os.path.join(in_dir, b, "Manifest.db"), os.path.join(out_dir, b, "Manifest.db")) copyfile(os.path.join(in_dir, b, "Manifest.plist"), os.path.join(out_dir, b, "Manifest.plist")) copyfile(os.path.join(in_dir, b, "Status.plist"),os.path.join(out_dir, b, "Status.plist"))
With the above Python script, we can get the updated back up file structure in our output folder. We can use pycrypto python library to decrypt the backups.
Mobile devices can be used to connect to the outside world by connecting through Wi-Fi networks which are available everywhere. Sometimes the device gets connected to these open networks automatically.
In case of iPhone, the list of open Wi-Fi connections with which the device has got connected is stored in a PLIST file named com.apple.wifi.plist. This file will contain the Wi-Fi SSID, BSSID and connection time.
We need to extract Wi-Fi details from standard Cellebrite XML report using Python. For this, we need to use API from Wireless Geographic Logging Engine (WIGLE), a popular platform which can be used for finding the location of a device using the names of Wi-Fi networks.
We can use Python library named requests to access the API from WIGLE. It can be installed as follows −
pip install requests
We need to register on WIGLE’s website https://wigle.net/account to get a free API from WIGLE. The Python script for getting the information about user device and its connection through WIGEL’s API is discussed below −
First, import the following libraries for handling different things −
from __future__ import print_function import argparse import csv import os import sys import xml.etree.ElementTree as ET import requests
Now, provide two positional arguments namely INPUT_FILE and OUTPUT_CSV which will represent the input file with Wi-Fi MAC address and the desired output CSV file respectively −
if __name__ == "__main__": parser.add_argument("INPUT_FILE", help = "INPUT FILE with MAC Addresses") parser.add_argument("OUTPUT_CSV", help = "Output CSV File") parser.add_argument("-t", help = "Input type: Cellebrite XML report or TXT file",choices = ('xml', 'txt'), default = "xml") parser.add_argument('--api', help = "Path to API key file",default = os.path.expanduser("~/.wigle_api"), type = argparse.FileType('r')) args = parser.parse_args()
Now following lines of code will check if the input file exists and is a file. If not, it exits the script −
if not os.path.exists(args.INPUT_FILE) or \ not os.path.isfile(args.INPUT_FILE): print("[-] {} does not exist or is not a file".format(args.INPUT_FILE)) sys.exit(1) directory = os.path.dirname(args.OUTPUT_CSV) if directory != '' and not os.path.exists(directory): os.makedirs(directory) api_key = args.api.readline().strip().split(":")
Now, pass the argument to main as follows −
main(args.INPUT_FILE, args.OUTPUT_CSV, args.t, api_key) def main(in_file, out_csv, type, api_key): if type == 'xml': wifi = parse_xml(in_file) else: wifi = parse_txt(in_file) query_wigle(wifi, out_csv, api_key)
Now, we will parse the XML file as follows −
def parse_xml(xml_file): wifi = {} xmlns = "{http://pa.cellebrite.com/report/2.0}" print("[+] Opening {} report".format(xml_file)) xml_tree = ET.parse(xml_file) print("[+] Parsing report for all connected WiFi addresses") root = xml_tree.getroot()
Now, iterate through the child element of the root as follows −
for child in root.iter(): if child.tag == xmlns + "model": if child.get("type") == "Location": for field in child.findall(xmlns + "field"): if field.get("name") == "TimeStamp": ts_value = field.find(xmlns + "value") try: ts = ts_value.text except AttributeError: continue
Now, we will check that ‘ssid’ string is present in the value’s text or not −
if "SSID" in value.text: bssid, ssid = value.text.split("\t") bssid = bssid[7:] ssid = ssid[6:]
Now, we need to add BSSID, SSID and timestamp to the wifi dictionary as follows −
if bssid in wifi.keys(): wifi[bssid]["Timestamps"].append(ts) wifi[bssid]["SSID"].append(ssid) else: wifi[bssid] = {"Timestamps": [ts], "SSID": [ssid],"Wigle": {}} return wifi
The text parser which is much simpler that XML parser is shown below −
def parse_txt(txt_file): wifi = {} print("[+] Extracting MAC addresses from {}".format(txt_file)) with open(txt_file) as mac_file: for line in mac_file: wifi[line.strip()] = {"Timestamps": ["N/A"], "SSID": ["N/A"],"Wigle": {}} return wifi
Now, let us use requests module to make WIGLE APIcalls and need to move on to the query_wigle() method −
def query_wigle(wifi_dictionary, out_csv, api_key): print("[+] Querying Wigle.net through Python API for {} " "APs".format(len(wifi_dictionary))) for mac in wifi_dictionary: wigle_results = query_mac_addr(mac, api_key) def query_mac_addr(mac_addr, api_key): query_url = "https://api.wigle.net/api/v2/network/search?" \ "onlymine = false&freenet = false&paynet = false" \ "&netid = {}".format(mac_addr) req = requests.get(query_url, auth = (api_key[0], api_key[1])) return req.json()
Actually there is a limit per day for WIGLE API calls, if that limit exceeds then it must show an error as follows −
try: if wigle_results["resultCount"] == 0: wifi_dictionary[mac]["Wigle"]["results"] = [] continue else: wifi_dictionary[mac]["Wigle"] = wigle_results except KeyError: if wigle_results["error"] == "too many queries today": print("[-] Wigle daily query limit exceeded") wifi_dictionary[mac]["Wigle"]["results"] = [] continue else: print("[-] Other error encountered for " "address {}: {}".format(mac, wigle_results['error'])) wifi_dictionary[mac]["Wigle"]["results"] = [] continue prep_output(out_csv, wifi_dictionary)
Now, we will use prep_output() method to flattens the dictionary into easily writable chunks −
def prep_output(output, data): csv_data = {} google_map = https://www.google.com/maps/search/
Now, access all the data we have collected so far as follows −
for x, mac in enumerate(data): for y, ts in enumerate(data[mac]["Timestamps"]): for z, result in enumerate(data[mac]["Wigle"]["results"]): shortres = data[mac]["Wigle"]["results"][z] g_map_url = "{}{},{}".format(google_map, shortres["trilat"],shortres["trilong"])
Now, we can write the output in CSV file as we have done in earlier scripts in this chapter by using write_csv() function.