This week we are going to be working with images as a form of data.
Last week we looked at how to import images and photographs and how to read the EXIF metadata that is attached to these images. This week we are going to go thorough a series of examples to show how we can pull out different forms of data from a set of images and use this to create data visualizations.
I have taken a set of example pictures for us to use for our class today. They are not very interesting pictures and are of me traveling from where I used to live in Hebden Bridge (outside Manchester) to a classroom in Warwick. On my journey I tried to take a picture of what was in front of me every 5 minutes. This dataset is not captured with the intention of linking it to a larger political/economic/environmental issue, rather it was created so that the EXIF metadata was easily extractable and varied.
The code below is example code of ways in which we can extract and work with code from image files. I would like you to read through the examples and think about if or how they could be useful for your project. Please feel free to ask questions, try it out for yourself, or adapt it for your own work!
For CDT students, all these examples are written in R. There is no expectation that you learn R or run this code. Rather, you can treat these as examples of what you can do programmatically with images and think about how you could do something similar in your notebooks.
Whats the data?
Before diving into what we can do with these images and the EXIF data lets have a look at the images themselves.
library(magick) # load in the libraries we need to work with imageslibrary(grDevices)par(mfrow =c(5,10), # set our canvas to plot with 5 rows and 10 columnsmar =c(0,0,0,0)) # set no margins around our plots# there are 49 images to load them all in through a simple loopfor(n in1:49){ picture_path <-paste0('media/photos/', n, '.jpeg')# all the images are named 1.jpg, 2.jpg, ect which makes them very easy to load in! picture <-image_read(picture_path)plot(picture)}
Figure 2.1: To get these images ready to use for today’s class, I modified them to make them much smaller using the image_resize(img, "x500" ) function. This makes it easier to host online and much quicker to work with in R.
If you would like to download a copy of these images to work through these examples you can do this from this link.
Loading EXIF Data
Lets start by loading the EXIF data from these photos we can do this using the same tools and libraries that we looked at in last weeks class.
Building a dataframe from EXIF data can sometimes be a little tricky as not all photographs may have the same number of metadata items attached to them. To overcome this challenge we are going to do this the easiest way possible by creating a blank data.frame and then looping through each picture, adding a new row to the dataframe each time.
To do this we using the rbind.fill() function from the plyr library as this function allows data.frame with unequal numbers of columns to be combined together. This is a good example of a tricky, and potentially time consuming, data cleaning issue that talked about in Wednesdays class).
library(exiftoolr) # load in our librarieslibrary(plyr)all_exif <-data.frame() # create a blank dataframefor(n in1:49){ picture_path <-paste0('media/photos/', n, '.jpeg') # loop through our photos, as above picture_exif <-exif_read(picture_path) # read the EXIF data all_exif <-rbind.fill(all_exif, picture_exif)}
SourceFile
ExifToolVersion
FileName
Directory
FileSize
FileModifyDate
FileAccessDate
FileInodeChangeDate
FilePermissions
FileType
FileTypeExtension
MIMEType
JFIFVersion
HDRGainCurveSize
HDRGainCurve
ExifByteOrder
Make
Model
Orientation
XResolution
YResolution
ResolutionUnit
Software
ModifyDate
HostComputer
TileWidth
TileLength
YCbCrPositioning
ExposureTime
FNumber
ExposureProgram
ISO
ExifVersion
DateTimeOriginal
CreateDate
ComponentsConfiguration
ShutterSpeedValue
ApertureValue
BrightnessValue
ExposureCompensation
MeteringMode
Flash
FocalLength
SubjectArea
MakerNoteVersion
RunTimeFlags
RunTimeValue
RunTimeScale
RunTimeEpoch
AEStable
AETarget
AEAverage
AFStable
AccelerationVector
FocusDistanceRange
ImageCaptureType
LivePhotoVideoIndex
PhotosAppFeatureFlags
HDRHeadroom
AFPerformance
SignalToNoiseRatio
PhotoIdentifier
ColorTemperature
CameraType
FocusPosition
SubSecTimeOriginal
SubSecTimeDigitized
FlashpixVersion
ColorSpace
ExifImageWidth
ExifImageHeight
SensingMethod
SceneType
ExposureMode
WhiteBalance
FocalLengthIn35mmFormat
SceneCaptureType
LensInfo
LensMake
LensModel
CompositeImage
GPSLatitudeRef
GPSLongitudeRef
GPSAltitudeRef
GPSTimeStamp
GPSSpeedRef
GPSSpeed
GPSImgDirectionRef
GPSImgDirection
GPSDestBearingRef
GPSDestBearing
GPSDateStamp
GPSHPositioningError
ProfileCMMType
ProfileVersion
ProfileClass
ColorSpaceData
ProfileConnectionSpace
ProfileDateTime
ProfileFileSignature
PrimaryPlatform
CMMFlags
DeviceManufacturer
DeviceModel
DeviceAttributes
RenderingIntent
ConnectionSpaceIlluminant
ProfileCreator
ProfileID
ProfileDescription
ProfileCopyright
MediaWhitePoint
RedMatrixColumn
GreenMatrixColumn
BlueMatrixColumn
RedTRC
ChromaticAdaptation
BlueTRC
GreenTRC
ImageWidth
ImageHeight
EncodingProcess
BitsPerSample
ColorComponents
YCbCrSubSampling
RunTimeSincePowerUp
Aperture
ImageSize
Megapixels
ScaleFactor35efl
ShutterSpeed
SubSecCreateDate
SubSecDateTimeOriginal
GPSAltitude
GPSDateTime
GPSLatitude
GPSLongitude
CircleOfConfusion
FOV
FocalLength35efl
GPSPosition
HyperfocalDistance
LightValue
LensID
XMPToolkit
CreatorTool
DateCreated
RegionAreaY
RegionAreaW
RegionAreaX
RegionAreaH
RegionAreaUnit
RegionType
RegionExtensionsAngleInfoYaw
RegionExtensionsAngleInfoRoll
RegionExtensionsConfidenceLevel
RegionExtensionsFaceID
RegionAppliedToDimensionsH
RegionAppliedToDimensionsW
RegionAppliedToDimensionsUnit
media/photos/1.jpeg
13.5
1.jpeg
media/photos
75614
2024:02:25 14:10:35+
2026:02:18 17:06:58+
2026:02:12 14:53:40+
100644
JPEG
JPG
image/jpeg
1 1
251
3691 7683 11382 1502
MM
Apple
iPhone XS Max
1
72
72
2
16.3.1
2024:02:09 07:38:05
iPhone XS Max
512
512
1
0.01666666667
1.8
2
320
0232
2024:02:09 07:38:05
2024:02:09 07:38:05
1 2 3 0
0.0165679999782223
1.79999999993144
1.510567669
0
5
16
4.25
2013 1511 2217 1393
14
1
176357126276750
1000000000
0
1
186
191
1
-0.03880077602 -1.00
2.9609375 2.30078125
10
13639680
0
1.686418056
157 268435570
33.00339506
0C54A3EE-31E2-43C9-A
7531
1
59
758
758
0100
65535
4032
3024
2
1
0
0
26
0
4.25 6 1.8 2.4
Apple
iPhone XS Max back d
2
N
W
0
20:00:00
K
0.02899493462
T
253.1574557
T
253.1574557
2024:02:09
13.7876972
appl
1024
mntr
RGB
XYZ
2022:01:01 00:00:00
acsp
APPL
0
APPL
0 0
0
0.9642 1 0.82491
appl
236 253 163 142 56 1
Display P3
Copyright Apple Inc.
0.96419 1 0.82489
0.51512 0.2412 -0.00
0.29198 0.69225 0.04
0.1571 0.06657 0.784
base64:cGFyYQAAAAAAA
1.04788 0.02292 -0.0
base64:cGFyYQAAAAAAA
base64:cGFyYQAAAAAAA
375
500
0
8
3
2 2
176357.12627675
1.8
375 500
0.1875
6.11764705882353
0.01666666667
2024:02:09 07:38:05.
2024:02:09 07:38:05.
110.8395769
2024:02:09 20:00:00Z
53.7416444444444
-2.01625555555556
0.00491140798741088
69.3903656740024
26
53.7416444444444 -2.
2.04314572276293
5.92481250331724
iPhone XS Max back d
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
media/photos/2.jpeg
13.5
2.jpeg
media/photos
79574
2024:02:25 14:10:35+
2026:02:18 17:07:01+
2026:02:12 14:53:42+
100644
JPEG
JPG
image/jpeg
1 1
251
3304 6809 10550 1441
MM
Apple
iPhone XS Max
1
72
72
2
16.3.1
2024:02:09 07:42:56
iPhone XS Max
512
512
1
0.01666666667
1.8
2
160
0232
2024:02:09 07:42:56
2024:02:09 07:42:56
1 2 3 0
0.0165679999782223
1.79999999993144
2.564540738
0
5
16
4.25
2013 1511 2217 1393
14
1
176608655067208
1000000000
0
1
173
176
1
0.0218595639 -1.0003
0.7265625 0.4296875
10
13639680
0
1.686418056
105 268435497
36.20163728
6D38E07B-7BF2-4836-9
7531
1
59
168
168
0100
65535
4032
3024
2
1
0
0
26
0
4.25 6 1.8 2.4
Apple
iPhone XS Max back d
2
N
W
0
20:00:00
K
2.097243071
T
112.0013657
T
112.0013657
2024:02:09
33.24521744
appl
1024
mntr
RGB
XYZ
2022:01:01 00:00:00
acsp
APPL
0
APPL
0 0
0
0.9642 1 0.82491
appl
236 253 163 142 56 1
Display P3
Copyright Apple Inc.
0.96419 1 0.82489
0.51512 0.2412 -0.00
0.29198 0.69225 0.04
0.1571 0.06657 0.784
base64:cGFyYQAAAAAAA
1.04788 0.02292 -0.0
base64:cGFyYQAAAAAAA
base64:cGFyYQAAAAAAA
375
500
0
8
3
2 2
176608.655067208
1.8
375 500
0.1875
6.11764705882353
0.01666666667
2024:02:09 07:42:56.
2024:02:09 07:42:56.
101.0746739
2024:02:09 20:00:00Z
53.7405083333333
-2.01409444444444
0.00491140798741088
69.3903656740024
26
53.7405083333333 -2.
2.04314572276293
6.92481250331724
iPhone XS Max back d
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
media/photos/3.jpeg
13.5
3.jpeg
media/photos
103244
2024:02:25 14:10:36+
2026:02:18 17:07:03+
2026:02:12 14:53:42+
100644
JPEG
JPG
image/jpeg
1 1
251
8069 15882 23790 316
MM
Apple
iPhone XS Max
1
72
72
2
16.3.1
2024:02:09 07:48:53
iPhone XS Max
512
512
1
0.04
1.8
2
320
0232
2024:02:09 07:48:53
2024:02:09 07:48:53
1 2 3 0
0.0399920000731997
1.79999999993144
-0.0478944311
0
5
16
4.25
2013 1511 2217 1393
14
1
176878939595083
1000000000
0
1
172
166
1
0.0207697507 -0.9942
1.59765625 1.09375
10
13639680
0
0
67 268435563
28.24822425
25F6B2C6-8939-4EA6-8
2858
1
65
566
566
0100
65535
4032
3024
2
1
0
0
26
0
4.25 6 1.8 2.4
Apple
iPhone XS Max back d
2
N
W
0
20:00:00
K
0
T
244.7424623
T
244.7424623
2024:02:09
8.590117966
appl
1024
mntr
RGB
XYZ
2022:01:01 00:00:00
acsp
APPL
0
APPL
0 0
0
0.9642 1 0.82491
appl
236 253 163 142 56 1
Display P3
Copyright Apple Inc.
0.96419 1 0.82489
0.51512 0.2412 -0.00
0.29198 0.69225 0.04
0.1571 0.06657 0.784
base64:cGFyYQAAAAAAA
1.04788 0.02292 -0.0
base64:cGFyYQAAAAAAA
base64:cGFyYQAAAAAAA
375
500
0
8
3
2 2
176878.939595083
1.8
375 500
0.1875
6.11764705882353
0.04
2024:02:09 07:48:53.
2024:02:09 07:48:53.
105.9569265
2024:02:09 20:00:00Z
53.7378888888889
-2.00883055555556
0.00491140798741088
69.3903656740024
26
53.7378888888889 -2.
2.04314572276293
4.66177809777199
iPhone XS Max back d
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
media/photos/4.jpeg
13.5
4.jpeg
media/photos
56051
2024:02:25 14:10:36+
2026:02:18 17:07:00+
2026:02:12 14:53:42+
100644
JPEG
JPG
image/jpeg
1 1
251
12438 24946 37155 49
MM
Apple
iPhone XS Max
1
72
72
2
16.3.1
2024:02:09 07:53:55
iPhone XS Max
512
512
1
0.01666666667
1.8
2
250
0232
2024:02:09 07:53:55
2024:02:09 07:53:55
1 2 3 0
0.0165679999782223
1.79999999993144
1.583796104
0
5
16
4.25
2013 1511 2217 1393
14
1
177066956766750
1000000000
0
1
169
164
1
0.07634519039 -0.791
4.1171875 3.36328125
10
13639680
0
0
8 268435797
33.22625733
37AC8A42-8D1C-4BCF-B
6618
1
231
658
658
0100
65535
4032
3024
2
1
0
0
26
0
4.25 6 1.8 2.4
Apple
iPhone XS Max back d
2
N
W
0
20:00:00
K
0.2655441764
T
86.43673707
T
86.43673707
2024:02:09
17.58620243
appl
1024
mntr
RGB
XYZ
2022:01:01 00:00:00
acsp
APPL
0
APPL
0 0
0
0.9642 1 0.82491
appl
236 253 163 142 56 1
Display P3
Copyright Apple Inc.
0.96419 1 0.82489
0.51512 0.2412 -0.00
0.29198 0.69225 0.04
0.1571 0.06657 0.784
base64:cGFyYQAAAAAAA
1.04788 0.02292 -0.0
base64:cGFyYQAAAAAAA
base64:cGFyYQAAAAAAA
375
500
0
8
3
2 2
177066.95676675
1.8
375 500
0.1875
6.11764705882353
0.01666666667
2024:02:09 07:53:55.
2024:02:09 07:53:55.
107.8988345
2024:02:09 20:00:00Z
53.7379305555556
-2.00911388888889
0.00491140798741088
69.3903656740024
26
53.7379305555556 -2.
2.04314572276293
6.28095631354252
iPhone XS Max back d
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
media/photos/5.jpeg
13.5
5.jpeg
media/photos
114105
2024:02:25 14:10:36+
2026:02:18 17:07:01+
2026:02:12 14:53:42+
100644
JPEG
JPG
image/jpeg
1 1
251
13268 26261 38926 51
MM
Apple
iPhone XS Max
1
72
72
2
16.3.1
2024:02:09 07:59:26
iPhone XS Max
512
512
1
0.02
1.8
2
200
0232
2024:02:09 07:59:26
2024:02:09 07:59:26
1 2 3 0
0.0199939999476312
1.79999999993144
1.729386043
0
5
16
4.25
2013 1511 2217 1393
14
1
177392672294583
1000000000
0
0
190
166
1
-0.0090702204 -1.004
5.14453125 2.3046875
10
13639680
0
1
112 268435591
33.62224582
D001CCA7-8D00-4C1F-9
6777
1
54
889
889
0100
65535
4032
3024
2
1
0
0
26
0
4.25 6 1.8 2.4
Apple
iPhone XS Max back d
2
N
W
0
20:00:00
K
0.6883943678
T
314.2805789
T
314.2805789
2024:02:09
28.84374421
appl
1024
mntr
RGB
XYZ
2022:01:01 00:00:00
acsp
APPL
0
APPL
0 0
0
0.9642 1 0.82491
appl
236 253 163 142 56 1
Display P3
Copyright Apple Inc.
0.96419 1 0.82489
0.51512 0.2412 -0.00
0.29198 0.69225 0.04
0.1571 0.06657 0.784
base64:cGFyYQAAAAAAA
1.04788 0.02292 -0.0
base64:cGFyYQAAAAAAA
base64:cGFyYQAAAAAAA
375
500
0
8
3
2 2
177392.672294583
1.8
375 500
0.1875
6.11764705882353
0.02
2024:02:09 07:59:26.
2024:02:09 07:59:26.
109.6631615
2024:02:09 20:00:00Z
53.7378305555556
-2.00918888888889
0.00491140798741088
69.3903656740024
26
53.7378305555556 -2.
2.04314572276293
6.33985000288463
iPhone XS Max back d
XMP Core 6.0.0
16.3.1
2024:02:09 07:59:26
0.68385714285714294
0.026190476190476208
0.53928571428571437
0.034571428571428586
normalized
Face
0
90
80
2
3024
4032
pixel
media/photos/6.jpeg
13.5
6.jpeg
media/photos
108813
2024:02:25 14:10:37+
2026:02:18 17:06:59+
2026:02:12 14:53:42+
100644
JPEG
JPG
image/jpeg
1 1
251
13239 26702 40488 54
MM
Apple
iPhone XS Max
1
72
72
2
16.3.1
2024:02:09 08:04:39
iPhone XS Max
512
512
1
0.02040816327
1.8
2
200
0232
2024:02:09 08:04:39
2024:02:09 08:04:39
1 2 3 0
0.0205200000829189
1.79999999993144
1.789725209
0
5
16
4.25
2013 1511 2217 1393
14
1
177704726360250
1000000000
0
0
170
177
1
-0.08329442888 -0.80
1.33984375 3.8007812
10
13639680
0
1.568873048
60 268435526
33.80025099
25F23E88-98E6-4CC7-A
6560
1
67
020
020
0100
65535
4032
3024
2
1
0
0
26
0
4.25 6 1.8 2.4
Apple
iPhone XS Max back d
2
N
W
0
20:00:00
K
0
T
292.9943236
T
292.9943236
2024:02:09
55
appl
1024
mntr
RGB
XYZ
2022:01:01 00:00:00
acsp
APPL
0
APPL
0 0
0
0.9642 1 0.82491
appl
236 253 163 142 56 1
Display P3
Copyright Apple Inc.
0.96419 1 0.82489
0.51512 0.2412 -0.00
0.29198 0.69225 0.04
0.1571 0.06657 0.784
base64:cGFyYQAAAAAAA
1.04788 0.02292 -0.0
base64:cGFyYQAAAAAAA
base64:cGFyYQAAAAAAA
375
500
0
8
3
2 2
177704.72636025
1.8
375 500
0.1875
6.11764705882353
0.02040816327
2024:02:09 08:04:39.
2024:02:09 08:04:39.
109.6561203
2024:02:09 20:00:00Z
53.7378055555556
-2.00898055555556
0.00491140798741088
69.3903656740024
26
53.7378055555556 -2.
2.04314572276293
6.31070365689329
iPhone XS Max back d
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
Figure 3.1: Note: To make this table easier to read online I’ve clipped each cell to show a maximum of 20 characters for the top 5 rows of the dataframe.
If you would like to download just a copy of this dataframe you can do this use this link
Direct Visulization
The first thing we can explore is using some of this EXIF data is to play around with direct visulization techniques, as we saw in the Lev Manovich article.
Since my photos are not (yet) connected to a bigger issue we are going to use the variable GPSHPositioningError and see how this changes across the pictures. We are going to start off simply and just use an if statement built into a loop. We are using the same loop we used to load in all our pictures above, and then plotting them with an opacity and a color, either blue or orange, laid over the top.
par(mfrow =c(5,10),mar =c(0,0,0,0))for(n in1:49){ picture <-image_read(all_exif[n,]$SourceFile) # now we have the exif data we # have changed this command use the 'SourceFile' variable from the dataframe gps_error <- all_exif[n,]$GPSHPositioningError # set the variableif (gps_error <=500){ plot(image_colorize(picture,opacity =50,color ='blue')) }if (gps_error >=500){plot(image_colorize(picture,opacity =50,color ='orange')) }}
We now have a direct data visulization! But or classification is quite simple. We can make this a little more interesting by using a color spectrum.
We can use the colorRampPalette function to achieve this. Our palette is going to run from blue to orange again. To set our color ramp we are going to set it against the max value (using the max() function). In this case it will create a color ramp with 11,000 colors! But when we plot them altogether it will give us a nice smooth gradient that looks like this:
We can then incorporate this into our loop (taking out the if statements) and setting the color of each photo by putting our color_ramp and GPSHPositioningError values together.
It looks like we have one extreme value that is throwing off our nice color ramp. (For the keen eyed or frequent travelers its Wolverhampton station).
We could do a quick bit of Exploratory Data Analysis to learn some more about the distribution of our values, for example with a box and whisker plot:
To overcome this issue we can classify our variable according to a set of intervals. We have looked at variable splits in other classes, and there are lots of other ways of doing this (I am a big fan of this example).
To do this we are going to use the classInt library and the cut() function and assign the results to a new variable in our dataframe we are calling error_cats.
Finally, we could change the order in which we are plotting our data. We can use the order() function to do this. Since we are using a loop, however, we do not need to re-order our data, we can simply create vector that loops through our pictures in an order defined by the GPSHPositioningError rank.
# We can use the order function to do this for us gps_error_order <-rownames(all_exif[order(all_exif$GPSHPositioningError),])par(mfrow =c(5,10),mar =c(0,0,0,0))for(n in gps_error_order){ # changing the order picture_path <-paste0(all_exif[n,]$SourceFile) picture <-image_read(picture_path)plot(image_colorize(picture,opacity =50,color = colour_ramp[all_exif[n,]$error_cats]) )}
GPS Data
Our EXIF metadata data also includes GPS data. In its simplest form, GPS data is just a set of coordinates, one x and one y this means we don’t even need a maps package to plot it. We can send these coordinates straight to the plot function to see what they look like.
While this is easy to plot, its not very decipherable. To make my journey a little easier to understand we can turn this into a line plot and include a sequence of numbers alongside.
Still not great, but it makes a little more sense. Lets bring this altogether and plot our images and our graph side by side. This next section gets a little complicated in how we are setting out our canvas in R, so don’t be overwhelmed if it doesn’t make immediate sense! (This may be very helpful to come back to later depending on what software you want to use to approach this project).
We can use par(omd=) to set up a par window within our canvas we can then call par() again to set out our 5x10 picture grid.
Using the other side of the canvas is a little more complicated. We need to use par(omd=) to specify the other side of the canvas. Then set out the number of rows again using par(mfrow=), in this case we just want 1 plot. Finally we have to use par(mfg=) so that our canvas does not reset when we call plot again.
par(omd=c(0,0.5,0,1),mfrow =c(5,10),mar =c(0,0,0,0))for(n in1:49){ picture_path <-paste0(all_exif[n,]$SourceFile) picture <-image_read(picture_path)plot(picture)text(200,250, # lets add in text to show the order of the pictureslabels = n,cex=3,col = colour_ramp[row(all_exif)[,1]]) # and add in the color data}par(omd=c(0.5,1,0,1),mfrow=c(1, 1),mfg=c(1, 1))plot(all_exif$GPSLongitude, all_exif$GPSLatitude,pch=19, cex=1,col = colour_ramp[all_exif$error_cats],axes = F)lines(all_exif$GPSLongitude, all_exif$GPSLatitude,type ="l",col ='grey',)text(all_exif$GPSLongitude, all_exif$GPSLatitude,labels =row(all_exif)[,1],col = colour_ramp[all_exif$error_cats],pos =3,offset =0.3)
So far we have worked with our spatial data by simply reading our GPS coordinates into R just as they are. However this doesn’t look quite right as the points are stretched out and our axis are scaled according the the minimum and maximum values in our data rather than the fixed grid of longitude and latitude. We could fix this by playing around with the scale of our axis. However, the easier way to address this issue is by treating our GPS metadata as spatial data.
To work with this data as proper ‘spatial data’ we need to read it with a library that treats space in a more complex way. To do this we are going to us the sf library.
To start with we are going to read in a shapefile of the UK local authorities. We can then use this as an example that we can match up to our photo data.
Reading layer `LAD_DEC_2023_UK_BUC' from data source
`/Users/u2272744/Dropbox/30 - 39 - Personal Teaching/32 - Data Visualization/32.IM942.2526 - Adv Viz/32.IM942.2526 - Adv Viz - Week 5 - Class/IMAGES_AS_DATA/media/local_authorites/LAD_DEC_2023_UK_BUC.shp'
using driver `ESRI Shapefile'
Simple feature collection with 361 features and 8 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -116.1928 ymin: 7054.1 xmax: 655653.8 ymax: 1220310
Projected CRS: OSGB36 / British National Grid
par(omd=c(0,1,0,1))par(mfrow=c(1,1))plot(la_data)
Figure 5.1: If you are curious about administrative boundaries, what these variable names mean, and how to work with spatial data in detail we will be exploring this further in project 3.
Now we can convert our EXIF data into a a spatial data format. The sf package provides easy tools to do this with. But first lets make a subset of this data as we don’t need all of the variables from the EXIF data file.
all_exif_sf <- all_exif[, c("SourceFile", "GPSLongitude", "GPSLatitude")]# And then convert this into a 'spatial' data.frameall_exif_sf <-st_as_sf(all_exif_sf,coords=c("GPSLongitude", "GPSLatitude"),crs=st_crs(4326)) # I will explain this number soon!plot(all_exif_sf)
Notice how, now we have converted our data into a spatial dataframe, that the points are not as stretched as when we were plotting them as simple x, y points? This is because base R was adjusting the x axis to fit the min and max of our data. However, as a spatial data format, sf tells R not to do this as it would distort the map!
Now lets plot the two files together to make sure everything is matching up.
plot(la_data$geometry)points(all_exif$GPSLongitude, all_exif$GPSLatitude, col ='red')
Wait, this doesn’t look right. All our points are the red ciricles out in the ocean, and we can tell from looking at the photographs themselves that this was not where they were taken!
So what has gone wrong? In this case, there is nothing wrong with the data but there is something off with the format in which the data is being read. When working with geographic data all the underlying points are encoded based on ‘projections’ of the earth. This is how points from a 3D sphere of our planet are translated into the 2D plane we use for making maps.
In this case each dataset has a different projection, encoded as the ‘crs’ or ‘coordinate reference system’. The UK data is based on the ‘OSGB36 / British National Grid’ which is used for all official data in the UK. (so created so that when you look at a UK map all the longitude and latitude grid lines are straight rather than being slightly curved). Whereas our the data from my phone is encoded as by the standard global GPS crs known as WGS84.
We can align these two projections using st_transform. The number 4326 specifies the WGS84 system.
la_data <-st_transform(la_data, crs=4326)plot(la_data$geometry)points(all_exif$GPSLongitude, all_exif$GPSLatitude, col ='red')
Figure 5.2: Notice how the shape of the UK subtlety changes and our points now line up!
With our spatial data now matching up correctly we bring the two datasets together by looking at where the data matches.Our GPS coordinates are not only good for plotting where our pictures were taken, but for linking up our photographs to other datasets. We can do this through an operation called points-in-polygons. Essentially we locate where our photos are in the shapes data or ‘polygons’ in the local authorities dataset we have just been ploting.
We can find where our points fit within these shapes using the st_intersections() function.
Using the the same loop as we did for the examples above we can now bring these two datasets together and write in the local authority names where each photograph was taken.
par(mfrow =c(5,10), # set our canvas to plot with 5 rows and 10 columnsmar =c(0,0,0,0)) # set no margins around our plotsfor(n in1:49){ picture_path <-paste0(all_exif[n,]$SourceFile) picture <-image_read(picture_path)plot(picture)text(200,250, # lets add in text to show the Local Authoritylabels = matching_points[n,]$LAD23NM, # labelingcex=1,col ='red' )}
Time Data
One of the attributes attached to our photos is time. Time can be very useful for representing all sorts of things about our data! However, time needs to be stored in R as a specific type of value. This example walks through the very basics of working with time and showing how to convert strings into a time format.
We can check the type of how or data is stored with typeof() function.
typeof(all_exif$DateTimeOriginal)
[1] "character"
As a character field our time data is currently seen by R as string of letters and numbers while this can be of use for double checking things, it means we cannot plot, add, subtract or visualise this data.
We can convert this strings into date formats, known in R as DateTimeClasses with strptime() function. However, we need to specify what the function is looking for. For looking at our data we can see that dates are beings stored as:
"2024:02:09 11:58:45"
We can then use special characters, which we specify with %, to tell R what and where our times are:
%Y stands for year
%m for month
%d for day
%H for hour
%M for minutes
%S for seconds
There are many potential options here which you can find out with the ?strptime command.
As our time data is now in the right format, we can now visualize it. To create sample data I tried to take a picture every 5 minutes, lets see how successful I was …
The code below is an easy way to accomplish this, but it isn’t the ‘best’ or most ‘efficient’ code. Trying to write ‘perfect’ code can often gets in the way of what we trying to archive. The main question we should ask ourselves when we are coding is does this help us accomplish our task?
As with all code, there are multiple different ways we can achieve the same result. With the below code, I’ve tried to make things as easy to follow as possible by reusing the loop we used to read in the EXIF data and not load any external libraries for other functions that we could use to achieve the same result. For our purposes today a loop is sufficient and can be an an easy way to get the results we are after.
# lets create a blank data frame, as we loop through our data we will add each# new row to the bottom of the data frametime_data <-data.frame()# we are also going to start from our second value for (n in2:49){ time_dif <-difftime(all_exif[n,]$time_formated, all_exif[n -1,]$time_formated, #and then to get theunits ='secs') # difference we use n - 1 to select the # previous value time_dif <-as.numeric(time_dif) # then have our time in seconds# then turn our results into a single row data.frame temp_time_data <-data.frame('SourceFile'= all_exif[n,]$SourceFile, 'time_between'= time_dif)# add add this to our time_data data.frame time_data <-rbind(time_data, temp_time_data)}# and finally, see how far off I was from the 5 minute marktime_data$five <- time_data$time_between -300# A simple plotpar(mfrow=c(1,1))plot(time_data$five, type='h')
Color Data
So far we have worked with our pictures by extracting the EXIF data and overlaying information on top or plotting it separately. But the images themselves are made up of data. For this next example we are going to extract color data from our images and plot this on its own, giving us a more abstract representation of our pictures
To do this we are going to use the package RImagePalette. There are several packages that can do operations like this, but this one is fast and simple to use so works well for this task. Other packages, can work with extracting color in different more specific ways from images so are worth looking up if you are interested!
We are also going to load in our images in a different way using the JPEG library. As it represent our image as a matrix of its underlying data. This means that the RImagePalette can directly perform operations on the underlying data.
We can then use the image_palette() function to pick out the nine most dominant colors from our example picture in a 3x3 grid.
col_pal <-image_palette(jpeg_example, n=9) par(mfrow=c(3,3),mar =c(0,0,0,0))for (c in col_pal){plot_blank()plot_background(col=c)}
Now lets scale this up to all of our pictures. I’m going to pick 5 colors and plot them sequentially so we need 5*49 cols. We are a also going to keep a running list of all the colors we generate during this loop for another visualization in the col_pal_all vector.
# I'm going to pick 5 colors and plot them sequentially so we need 5*49 colspar(mfrow=c(1,245),mar =c(0,0,0,0))# We are a also going to keep a running list of all the colors we generate# during this loop for another visualization col_pal_all <-c()for(n in1:49){ picture_path <-paste0(all_exif[n,]$SourceFile) picture <- jpeg::readJPEG(picture_path) col_pal <-image_palette(picture, n=5)for (c in col_pal){plot_blank()plot_background(col=c) } col_pal_all <-append(col_pal_all, col_pal)}
With the all our colors in the single vector col_pal_all we can sort them into an order to make a different visulization. However, sorting colors is a non-trivial problem, which this blog post dives into. To avoid this complexity we are going to use the lterpalettefinder library which has a color sorter function we can use for now.
library(lterpalettefinder)col_pal_sorted <-palette_sort(col_pal_all)par(mfrow=c(1,245),mar =c(0,0,0,0))for (c in col_pal_sorted){plot_blank()plot_background(col=c)}
OCR
Finally, we can extract data of what is contained in the images themselves, for example all of the text. To do this we are going to use a form of machine learning called OCR, or Optical Character Recognition.
To do this we are going use the google tesseract library.
library(tesseract)# setting the languagetesseract(language ="eng")text_data <-data.frame()# Unlike all our other loops for this task we need higher resolution images# so we are loading in our pictures from a different directoryfor(n in1:49){ picture_path <-paste0('media/og_photos/', n, '.jpeg') picture <-image_read(picture_path) picture_text <-image_ocr_data(picture) picture_text$source <- all_exif[n,]$SourceFile text_data <-rbind(text_data, picture_text) }
Figure 8.1: Note this step requires higher resolution images which you need to download from the class Teams channel.
Looking at the text data that has been extracted from the text we can see that lots of it is unusable!
…1
word
confidence
bbox
source
1
i
50.11905
2590,0,2613,89
media/photos/1.jpeg
2
!
35.61607
2655,8,2676,75
media/photos/1.jpeg
3
“Vb
64.10910
2745,0,2814,41
media/photos/1.jpeg
4
J
95.34794
2892,0,2930,98
media/photos/1.jpeg
5
A
25.13509
2949,0,2995,43
media/photos/1.jpeg
6
oe
40.92468
2446,99,2500,116
media/photos/1.jpeg
To address this issue over this we are going to filter out everything with a confidence level below 90:
Now that we have our words we extracted from our images we can create a data visualization from them. As both a bit of EDA and an straightforward bit of code, I’m going to make a wordcloud.
library(wordcloud2)# all we have to do is count up how many times each word occurs wordcloud_data <-as.data.frame(table(ninty_plus_words[,c("word")] ))# voila! wordcloud2(wordcloud_data)
Class Challange!
I created this dataset without a motivation of linking the data I collected to a larger issue.However, we could think about some ways in which this could be made to link the personal and political. To trial this out I have linked the data we have just worked with to some external datasets.
As a challenge for this class I would like you to download this csv and see what data visulizations you can produce?
I have included 5 new variables linked via the location each picture was taken. These are: