# Virtual surrounding impression tool This tool generates VSI hashes on the client computers *** Please read before using! *** *** Developers can skip chapter 4 and check chapter 5! *** This folder contains sample code for "virtual surrounding impression" generator as a Python script (VSI.py) # CONTENTS 1. License 2. Introduction 3. Principles of operation 4. Security, privacy concerns 5. How to use the provided code 5.1 Supported enviromnents and requirements 5.2 Use on Linux clients 5.3 Use on Windows clients 5.4 Use on MacOS clients 6. How to implement your own solution (<-- programmers read this) 7. Authors # 1. LICENSE While this is clearly only a rough proof-of-concept demo code, you can use it freely under the GNU GPL v3 license. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . ## 2. INTRODUCTION This is sample app that generates virtual surrounding impression and works in all three major computer operating systems. It doesn't require any parameters and it returns a clean text output that can be directly piped into other computer programs. If you have limited developer resources, you can call this script from within your survey software and save the returned text as a string variable. If you have any programming means, please read chapter 5. It's not much work to implement it in your solution. If you choose to use this script, you can use PyInstaller to compile it into an executable program that doesn't need python installed on clients. Please read more about the actual principles of operation in the included "How_VSI_works.pdf" PDF file. **************************************************************************** # 3. PRINCIPLES OF OPERATION For more details, refer to the included "How_VSI_works.pdf". In essence, the script: a) scans surrounding wifi APs and retrieve their BSSIDs and signal strengths b) sorts them in descending order based on signal strength c) takes first five AP BSSIDs and their strengths, e.g.: 12:34:56:78:90:ab 80% 23:34:45:56:56:67 79% d) salts each and every BSSID and strength separately with machine UUID and username. When less than five APs are visible, DO NOT SALT EMPTY BSSIDs. e) makes 8 characters long hash of each salted BSSID and signal strength SEPARATELY. 5 APs with two data points (BSSID, power) generate 5*2=10 data 8-char long hashes. 10*8 = 80, hence the 80-characters long VSI code. **************************************************************************** # 4. SECURITY, PRIVACY Each computer salts differently for each user, so it should be safe from reverse lookups using rainbow tables: 128 bit UUID plus username. This script uses xhash and CRC32, but you can use anything else, e.g. xxhsum -H0; fletcher-32; adler32, or make a SHA-1 and take out 8 chars. In practice, nobody is going to alter wifi names to fake different locations as it is much easier to simply turn the WiFi off. Collisions are possible, but not critical for the given usage. Please refer to included "How_VSI_works.pdf" for more; you can also read a word or two about the collisions here: https://preshing.com/20110504/hash-collision-probabilities/ **************************************************************************** # 5. HOW TO USE THE PROVIDED CODE Your survey software should call this procedure three times during a survey; we suggest implementing it as a hidden string variable at a fixed location in the survey (e.g. after 1st block, middle block, last block). We suggest doing it in a parallel thread, as scanning networks can take a couple of seconds. If that is not possible, we suggest scheduling this script to run every 10 minutes and store the results in a temporary text file, idealy on a volatile memory (RAM) so the contents get removed upon reboot or shut down. Then, simply read text file contents from the survey. Scheduling command on Linux on Mac: `python3 SampleLocator.py > /tmp/VSI.txt` Scheduling command on Windows: `python3 SampleLocator.py > %tmp%\VSI.txt` ** NOTE: when using temporary files, do NOT store hashes permanently on a nonvolatile memory. Do not store more than a single (last) hash. Reading the text into a survey (if the software supports system commands as a variable input): Linux, Mac: `cat /tmp/VSI.txt` Windows: `type %tmp%\VSI.txt` ---------------------------------------------------------------------------- ## 5.1 SUPPORTED ENVIRONMENTS AND REQUIREMENTS This script works on: Windows (7+) Linux (nmcli) MacOS, OS X (2010+) Required software: If you don't compile it into a binary for the target platform (by using PyInstaller), you must install Python 3 to interpret the script. ---------------------------------------------------------------------------- ## 5.2 USE ON LINUX CLIENTS 1. Make sure you have Python3 installed 2. Run the script Hint: you can produce the same without python and this this script by simply running this one-liner (install xxlhash to make such hashes in CLI): `salt=$(cat /etc/machine-id)$(whoami) && for a in $(nmcli -f BSSID,SIGNAL device wifi list --rescan yes | awk -v s=$salt '{print $1 s \"\\n\" $2 s}'); do xxhsum -H0 <(echo $a); done | cut -d ' ' -f 1 | tail -n +2 | head -n 10 | xargs echo | sed 's/ //g'` ---------------------------------------------------------------------------- ## 5.3 USE ON MAC CLIENTS 1. install Python3 from the AppStore 2. allow the user to run airport -s (use sudo) 2. Run the script Hint: you can produce the same without python and this this script by simply running this one-liner (install xxlhash to make hashes in CLI): `salt=$(ioreg -d2 -c IOPlatformExpertDevice | awk -F\" \'/IOPlatformUUID/{print $(NF-1)}\')$(whoami) && for a in $(sudo /System/Library/PrivateFrameworks/Apple80211.framework/Versions/A/Resources/airport -s | perl -nle 'm/(?<=\s)[0-9a-f]{2}(:[0-9a-f]{2}){5}\s+-?[[:digit:]]{2}/ and print "$&"' | sed '1!G;h;$!d' | awk -v s=$salt '{print $1 s \"\\n\" $2 s}'); do xxhsum -H0 <(echo $a); done | cut -d ' ' -f 1 | tail -n +2 | head -n 10 | xargs echo | sed 's/ //g'` ---------------------------------------------------------------------------- ## 5.4 USE ON WINDOWS CLIENTS Install python 3 from MS store (search for python, select version 3.x) Windows cannot rescan wifi without admin privileges (by turning wifi off and on) from command line thereby making list of networks unreliable. Additionally, cmd doesn't have a sudo :-) To resolve this, you can find precompiled "wifi.exe" from https://github.com/changyuheng/winwifi in "Windows tools" subdirectory. If it doesn't work for some reason, get it via pipx: open command line (WIN+R, type "cmd.exe") and execute a-c: a) `"python -m pip install --user pipx"` b) `"python -m pipx install winwifi"` c) `"python -m pipx ensurepath"` Then you're ready to go. Instead of ensuring path (e), you can just copy "wifi.exe" to this script directory. ---------------------------------------------------------------------------- ## 5.5 TOUBLESHOOTING PERMISSION DENIED on Linux or Mac Allow user to run "airport" (Mac) or "nmcli" (Linux) via sudo. XXHSUM COMMAND NOT FOUND ERROR on Linux or Mac Install xxhash or use a different algorythm (e.g. crc32) WINDOWS: string are always short, but there are definitely quite a few Aps visible?! Have you installed winwifi? **************************************************************************** # 6. HOW TO IMPLEMENT YOUR OWN SOLUTION We suggest using existing command line tools to save time (e.g. nmcli, airport or netsh). Please note that netsh *DOES NOT* rescan the network and often displays just a single, currently used network. Nevetheless, here's the procedure that works on all platforms: a) obtain unique machine id and username or user id You'll use this as a salt. b) scan wifi and obtain BSSID + POWER Make sure you have privileges to do so c) take (up to) first 5 access points and their corresponding power levels d) salt + hash each data point (each BSSID and strength) separately Do NOT salt+hash empty (nonexistent) networks e) combine all (up to) 10 hashes Or empty strings for nonexisting Aps Additional hint: Don't wait for the scanning to end withing the survey; read ch. 4 about the details on how to workaround this (crontab). # 7. AUTHORS Developed as a part of work package 8 of the European Social Survey 2021-23 Work Programme. Members: - May Doušak (UL) - Joost Kappelhof (SCP), - Roberto Briceno-Rosas (GESIS), Programming, technical contact: - May Doušak may.dousak@fdv.uni-lj.si