Spatial Usage Data Analysis for R

Lucas Braun

2018-08-02

Spatial usage data is information about where an application is used. Usage data from location-based services has unique properties and challenges. Each record is associated with at least one user, time, and spatial position. However, one or more of these properties is often unavailable or removed from the data in the interest of user privacy. So what can we learn from this data?

Introduction

The goal of this package is to give researchers and app developers an easy way to visualize and analyze how and where people use a given service.

For my thesis I am looking at how to reduce the social isolation of forced-migrants (e.g. refugees and asylum seekers) through the use of location-based services. Such a tool will be particularly useful after I collect data through a real-world study to identify patterns that are not otherwise obvious.

Loading data

The spud package is designed to import data from a csv that looks something like this:

user datetime latitude longitude action
1 2018-07-10 10:30:19 51.98354 7.663667 Checked in
1 2018-07-11 11:40:40 51.97083 7.668409 Checked in
1 2018-07-11 5:13:43 51.97866 7.664976 Liked post
1 2018-07-12 4:33:55 51.93486 7.663182 Checked in
1 2018-07-13 12:15:37 51.99216 7.650585 Checked in
1 2018-07-14 2:48:52 51.95266 7.623008 Liked post
1 2018-07-15 5:3:50 51.98650 7.643382 Added friend
1 2018-07-16 10:4:25 51.92937 7.673198 Added friend
1 2018-07-16 4:17:39 51.92975 7.645269 Liked post
1 2018-07-18 9:52:43 51.98668 7.630936 Added friend

This is achieved via the read.spud method, which takes the source file’s name and the coordinate reference system of the data as parameters.

spud = read.spud(file = "dummy_data.csv", crs = 4326)

This method returns a simple feature collection:

#> Simple feature collection with 500 features and 3 fields
#> Attribute-geometry relationship: 3 constant, 0 aggregate, 0 identity
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 7.560358 ymin: 51.92021 xmax: 7.689844 ymax: 52
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#> First 10 features:
#>    user            datetime       action                  geometry
#> 1     1 2018-07-10 10:30:19   Checked in POINT (7.663667 51.98354)
#> 2     1 2018-07-11 11:40:40   Checked in POINT (7.668409 51.97083)
#> 3     1 2018-07-11 05:13:43   Liked post POINT (7.664976 51.97866)
#> 4     1 2018-07-12 04:33:55   Checked in POINT (7.663182 51.93486)
#> 5     1 2018-07-13 12:15:37   Checked in POINT (7.650585 51.99216)
#> 6     1 2018-07-14 02:48:52   Liked post POINT (7.623008 51.95266)
#> 7     1 2018-07-15 05:03:50 Added friend  POINT (7.643382 51.9865)
#> 8     1 2018-07-16 10:04:25 Added friend POINT (7.673198 51.92937)
#> 9     1 2018-07-16 04:17:39   Liked post POINT (7.645269 51.92975)
#> 10    1 2018-07-18 09:52:43 Added friend POINT (7.630936 51.98668)

Classes

The spud package provides an object-oriented framework for quickly and intuitively exploring spatial usage data. I chose to implement R6 classes because I wanted to learn something new and I read that R6 classes are nice as they resemble classes in other languages.

The spud package’s two classes represent the most important objects in a location-based service: the application and its users.

App class

The App class is a simple container for application usage data with methods that allow us to visualize the data from a holistic perspective. It is initialized from a simple feature collection, such as the one returned by the read.spud method:

app = App$new(name = "My fancy app", usage_data = spud)

The spud package also provides a convenience method read.spud_app to load data directly from a file into an App instance:

app = read.spud_app(file = "dummy_data.csv", crs = 4326, name = "My fancy app")

App has a print method defined that allows us to quickly know which app we’re looking at:

print(app)
#> <App> My fancy app, an app with 20 users

Once we have our App object initialized, we might want to know who its users are:

app$users() # The unique ids of our users
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
app$user_count() # The total number of users
#> [1] 20

More interestingly, we might want to know where our app has been used. In any app, a user can take certain actions. When we see where our app is used, it is helpful to know what functionality was being used in those locations. Therefore spud provides an actions_map method that shows where and for what the app was being used at the same time:

app$actions_map()

This method uses the leaflet package by default, but can also be run using the mapview package. The mapview-flavored map shows more information on each data point but uses a less detailed basemap.

app$actions_map(flavor = "mapview")

We might also want to know what actions users take and where they tend to be when they first use the app. For this we have the first_actions_map method, which plots where each user took their first action:

app$first_actions_map()

User class

The User class is a also simple container for application usage data, but with methods focused on the perspective of just one user. It can also be initialized from a file or simple feature collection, but often we simply ask an App to get one of its Users for us:

user = app$get_user(user_id = 12)

Like the App class, the User class also has a print method defined that identifies who we are looking at:

print(user)
#> Selecting by geometry
#> <User> 12 (active since July 31, 2018)

This method makes use of the first_action method, which returns information on the very first location record to be recorded for this user:

user$first_action()
#> Selecting by geometry
#> Simple feature collection with 1 feature and 3 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 7.603854 ymin: 51.92322 xmax: 7.603854 ymax: 51.92322
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#>   user            datetime     action                  geometry
#> 1   12 2018-07-31 08:52:01 Checked in POINT (7.603854 51.92322)

The User class has the same actions_map method as the App class. It works the same way but only displays the user’s own data:

user$actions_map()

For those who are interested in the temporal dimension as well, spud also allows you to see the user’s path around their environment:

user$path_map()

General map methods

Both classes make use of shared base plotting methods, so as not to duplicate code unnecessarily.

There are two plot_usage_actions methods, one for each mapping package:

plot_usage_actions_leaflet(data = spud)
plot_usage_actions_mapview(data = spud)

The plotting of user paths is also generalized, to allow multiple user paths to be plotted on the same map in the future:

plot_user_path(data = spud, user_id = 1)

Development notes

Package check

The package checks cleanly on my machine except for one note, which complains about the dplyr syntax in the plot_user_path method. dplyr allows you to refer to previously undeclared variables in its filter and arrange functions, but the package checker thinks these are global variables without a proper definition.

Future work

There are many ways I would like to expand this package if I have time. Here are a few of the tasks that didn’t fit into the initial development time frame: