Images as Data

Author
Affiliation

Timothy Monteath

Published

February 14, 2026

Hello!

Welcome to IM946 week 5!

This week we are going to be working with images as a form of data.

Last week we looked at how to import images and photographs and how to read the EXIF metadata that is attached to these images. This week we are going to go thorough a series of examples to show how we can pull out different forms of data from a set of images and use this to create data visualizations.

I have taken a set of example pictures for us to use for our class today. They are not very interesting pictures and are of me traveling from where I used to live in Hebden Bridge (outside Manchester) to a classroom in Warwick. On my journey I tried to take a picture of what was in front of me every 5 minutes. This dataset is not captured with the intention of linking it to a larger political/economic/environmental issue, rather it was created so that the EXIF metadata was easily extractable and varied.

Today we are going look at:

The code below is example code of ways in which we can extract and work with code from image files. I would like you to read through the examples and think about if or how they could be useful for your project. Please feel free to ask questions, try it out for yourself, or adapt it for your own work!

For CDT students, all these examples are written in R. There is no expectation that you learn R or run this code. Rather, you can treat these as examples of what you can do programmatically with images and think about how you could do something similar in your notebooks.

Whats the data?

Before diving into what we can do with these images and the EXIF data lets have a look at the images themselves.

library(magick) # load in the libraries we need to work with images
library(grDevices)

par(mfrow = c(5,10), # set our canvas to plot with 5 rows and 10 columns
    mar = c(0,0,0,0)) # set no margins around our plots

# there are 49 images to load them all in through a simple loop
for(n in 1:49){ 
  picture_path <- paste0('media/photos/', n, '.jpeg')
  # all the images are named 1.jpg, 2.jpg, ect which makes them very easy to load in!
  picture <- image_read(picture_path)
  plot(picture)
}

Figure 2.1: To get these images ready to use for today’s class, I modified them to make them much smaller using the image_resize(img, "x500" ) function. This makes it easier to host online and much quicker to work with in R.

If you would like to download a copy of these images to work through these examples you can do this from this link.

Loading EXIF Data

Lets start by loading the EXIF data from these photos we can do this using the same tools and libraries that we looked at in last weeks class.

Building a dataframe from EXIF data can sometimes be a little tricky as not all photographs may have the same number of metadata items attached to them. To overcome this challenge we are going to do this the easiest way possible by creating a blank data.frame and then looping through each picture, adding a new row to the dataframe each time.

To do this we using the rbind.fill() function from the plyr library as this function allows data.frame with unequal numbers of columns to be combined together. This is a good example of a tricky, and potentially time consuming, data cleaning issue that talked about in Wednesdays class).

library(exiftoolr) # load in our libraries
library(plyr)

all_exif <- data.frame() # create a blank dataframe

for(n in 1:49){
  picture_path <- paste0('media/photos/', n, '.jpeg') # loop through our photos, as above
  picture_exif <- exif_read(picture_path) # read the EXIF data
  all_exif <- rbind.fill(all_exif, picture_exif)
}
SourceFile ExifToolVersion FileName Directory FileSize FileModifyDate FileAccessDate FileInodeChangeDate FilePermissions FileType FileTypeExtension MIMEType JFIFVersion HDRGainCurveSize HDRGainCurve ExifByteOrder Make Model Orientation XResolution YResolution ResolutionUnit Software ModifyDate HostComputer TileWidth TileLength YCbCrPositioning ExposureTime FNumber ExposureProgram ISO ExifVersion DateTimeOriginal CreateDate ComponentsConfiguration ShutterSpeedValue ApertureValue BrightnessValue ExposureCompensation MeteringMode Flash FocalLength SubjectArea MakerNoteVersion RunTimeFlags RunTimeValue RunTimeScale RunTimeEpoch AEStable AETarget AEAverage AFStable AccelerationVector FocusDistanceRange ImageCaptureType LivePhotoVideoIndex PhotosAppFeatureFlags HDRHeadroom AFPerformance SignalToNoiseRatio PhotoIdentifier ColorTemperature CameraType FocusPosition SubSecTimeOriginal SubSecTimeDigitized FlashpixVersion ColorSpace ExifImageWidth ExifImageHeight SensingMethod SceneType ExposureMode WhiteBalance FocalLengthIn35mmFormat SceneCaptureType LensInfo LensMake LensModel CompositeImage GPSLatitudeRef GPSLongitudeRef GPSAltitudeRef GPSTimeStamp GPSSpeedRef GPSSpeed GPSImgDirectionRef GPSImgDirection GPSDestBearingRef GPSDestBearing GPSDateStamp GPSHPositioningError ProfileCMMType ProfileVersion ProfileClass ColorSpaceData ProfileConnectionSpace ProfileDateTime ProfileFileSignature PrimaryPlatform CMMFlags DeviceManufacturer DeviceModel DeviceAttributes RenderingIntent ConnectionSpaceIlluminant ProfileCreator ProfileID ProfileDescription ProfileCopyright MediaWhitePoint RedMatrixColumn GreenMatrixColumn BlueMatrixColumn RedTRC ChromaticAdaptation BlueTRC GreenTRC ImageWidth ImageHeight EncodingProcess BitsPerSample ColorComponents YCbCrSubSampling RunTimeSincePowerUp Aperture ImageSize Megapixels ScaleFactor35efl ShutterSpeed SubSecCreateDate SubSecDateTimeOriginal GPSAltitude GPSDateTime GPSLatitude GPSLongitude CircleOfConfusion FOV FocalLength35efl GPSPosition HyperfocalDistance LightValue LensID XMPToolkit CreatorTool DateCreated RegionAreaY RegionAreaW RegionAreaX RegionAreaH RegionAreaUnit RegionType RegionExtensionsAngleInfoYaw RegionExtensionsAngleInfoRoll RegionExtensionsConfidenceLevel RegionExtensionsFaceID RegionAppliedToDimensionsH RegionAppliedToDimensionsW RegionAppliedToDimensionsUnit
media/photos/1.jpeg 13.5 1.jpeg media/photos 75614 2024:02:25 14:10:35+ 2026:02:18 17:06:58+ 2026:02:12 14:53:40+ 100644 JPEG JPG image/jpeg 1 1 251 3691 7683 11382 1502 MM Apple iPhone XS Max 1 72 72 2 16.3.1 2024:02:09 07:38:05 iPhone XS Max 512 512 1 0.01666666667 1.8 2 320 0232 2024:02:09 07:38:05 2024:02:09 07:38:05 1 2 3 0 0.0165679999782223 1.79999999993144 1.510567669 0 5 16 4.25 2013 1511 2217 1393 14 1 176357126276750 1000000000 0 1 186 191 1 -0.03880077602 -1.00 2.9609375 2.30078125 10 13639680 0 1.686418056 157 268435570 33.00339506 0C54A3EE-31E2-43C9-A 7531 1 59 758 758 0100 65535 4032 3024 2 1 0 0 26 0 4.25 6 1.8 2.4 Apple iPhone XS Max back d 2 N W 0 20:00:00 K 0.02899493462 T 253.1574557 T 253.1574557 2024:02:09 13.7876972 appl 1024 mntr RGB XYZ 2022:01:01 00:00:00 acsp APPL 0 APPL 0 0 0 0.9642 1 0.82491 appl 236 253 163 142 56 1 Display P3 Copyright Apple Inc. 0.96419 1 0.82489 0.51512 0.2412 -0.00 0.29198 0.69225 0.04 0.1571 0.06657 0.784 base64:cGFyYQAAAAAAA 1.04788 0.02292 -0.0 base64:cGFyYQAAAAAAA base64:cGFyYQAAAAAAA 375 500 0 8 3 2 2 176357.12627675 1.8 375 500 0.1875 6.11764705882353 0.01666666667 2024:02:09 07:38:05. 2024:02:09 07:38:05. 110.8395769 2024:02:09 20:00:00Z 53.7416444444444 -2.01625555555556 0.00491140798741088 69.3903656740024 26 53.7416444444444 -2. 2.04314572276293 5.92481250331724 iPhone XS Max back d NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
media/photos/2.jpeg 13.5 2.jpeg media/photos 79574 2024:02:25 14:10:35+ 2026:02:18 17:07:01+ 2026:02:12 14:53:42+ 100644 JPEG JPG image/jpeg 1 1 251 3304 6809 10550 1441 MM Apple iPhone XS Max 1 72 72 2 16.3.1 2024:02:09 07:42:56 iPhone XS Max 512 512 1 0.01666666667 1.8 2 160 0232 2024:02:09 07:42:56 2024:02:09 07:42:56 1 2 3 0 0.0165679999782223 1.79999999993144 2.564540738 0 5 16 4.25 2013 1511 2217 1393 14 1 176608655067208 1000000000 0 1 173 176 1 0.0218595639 -1.0003 0.7265625 0.4296875 10 13639680 0 1.686418056 105 268435497 36.20163728 6D38E07B-7BF2-4836-9 7531 1 59 168 168 0100 65535 4032 3024 2 1 0 0 26 0 4.25 6 1.8 2.4 Apple iPhone XS Max back d 2 N W 0 20:00:00 K 2.097243071 T 112.0013657 T 112.0013657 2024:02:09 33.24521744 appl 1024 mntr RGB XYZ 2022:01:01 00:00:00 acsp APPL 0 APPL 0 0 0 0.9642 1 0.82491 appl 236 253 163 142 56 1 Display P3 Copyright Apple Inc. 0.96419 1 0.82489 0.51512 0.2412 -0.00 0.29198 0.69225 0.04 0.1571 0.06657 0.784 base64:cGFyYQAAAAAAA 1.04788 0.02292 -0.0 base64:cGFyYQAAAAAAA base64:cGFyYQAAAAAAA 375 500 0 8 3 2 2 176608.655067208 1.8 375 500 0.1875 6.11764705882353 0.01666666667 2024:02:09 07:42:56. 2024:02:09 07:42:56. 101.0746739 2024:02:09 20:00:00Z 53.7405083333333 -2.01409444444444 0.00491140798741088 69.3903656740024 26 53.7405083333333 -2. 2.04314572276293 6.92481250331724 iPhone XS Max back d NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
media/photos/3.jpeg 13.5 3.jpeg media/photos 103244 2024:02:25 14:10:36+ 2026:02:18 17:07:03+ 2026:02:12 14:53:42+ 100644 JPEG JPG image/jpeg 1 1 251 8069 15882 23790 316 MM Apple iPhone XS Max 1 72 72 2 16.3.1 2024:02:09 07:48:53 iPhone XS Max 512 512 1 0.04 1.8 2 320 0232 2024:02:09 07:48:53 2024:02:09 07:48:53 1 2 3 0 0.0399920000731997 1.79999999993144 -0.0478944311 0 5 16 4.25 2013 1511 2217 1393 14 1 176878939595083 1000000000 0 1 172 166 1 0.0207697507 -0.9942 1.59765625 1.09375 10 13639680 0 0 67 268435563 28.24822425 25F6B2C6-8939-4EA6-8 2858 1 65 566 566 0100 65535 4032 3024 2 1 0 0 26 0 4.25 6 1.8 2.4 Apple iPhone XS Max back d 2 N W 0 20:00:00 K 0 T 244.7424623 T 244.7424623 2024:02:09 8.590117966 appl 1024 mntr RGB XYZ 2022:01:01 00:00:00 acsp APPL 0 APPL 0 0 0 0.9642 1 0.82491 appl 236 253 163 142 56 1 Display P3 Copyright Apple Inc. 0.96419 1 0.82489 0.51512 0.2412 -0.00 0.29198 0.69225 0.04 0.1571 0.06657 0.784 base64:cGFyYQAAAAAAA 1.04788 0.02292 -0.0 base64:cGFyYQAAAAAAA base64:cGFyYQAAAAAAA 375 500 0 8 3 2 2 176878.939595083 1.8 375 500 0.1875 6.11764705882353 0.04 2024:02:09 07:48:53. 2024:02:09 07:48:53. 105.9569265 2024:02:09 20:00:00Z 53.7378888888889 -2.00883055555556 0.00491140798741088 69.3903656740024 26 53.7378888888889 -2. 2.04314572276293 4.66177809777199 iPhone XS Max back d NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
media/photos/4.jpeg 13.5 4.jpeg media/photos 56051 2024:02:25 14:10:36+ 2026:02:18 17:07:00+ 2026:02:12 14:53:42+ 100644 JPEG JPG image/jpeg 1 1 251 12438 24946 37155 49 MM Apple iPhone XS Max 1 72 72 2 16.3.1 2024:02:09 07:53:55 iPhone XS Max 512 512 1 0.01666666667 1.8 2 250 0232 2024:02:09 07:53:55 2024:02:09 07:53:55 1 2 3 0 0.0165679999782223 1.79999999993144 1.583796104 0 5 16 4.25 2013 1511 2217 1393 14 1 177066956766750 1000000000 0 1 169 164 1 0.07634519039 -0.791 4.1171875 3.36328125 10 13639680 0 0 8 268435797 33.22625733 37AC8A42-8D1C-4BCF-B 6618 1 231 658 658 0100 65535 4032 3024 2 1 0 0 26 0 4.25 6 1.8 2.4 Apple iPhone XS Max back d 2 N W 0 20:00:00 K 0.2655441764 T 86.43673707 T 86.43673707 2024:02:09 17.58620243 appl 1024 mntr RGB XYZ 2022:01:01 00:00:00 acsp APPL 0 APPL 0 0 0 0.9642 1 0.82491 appl 236 253 163 142 56 1 Display P3 Copyright Apple Inc. 0.96419 1 0.82489 0.51512 0.2412 -0.00 0.29198 0.69225 0.04 0.1571 0.06657 0.784 base64:cGFyYQAAAAAAA 1.04788 0.02292 -0.0 base64:cGFyYQAAAAAAA base64:cGFyYQAAAAAAA 375 500 0 8 3 2 2 177066.95676675 1.8 375 500 0.1875 6.11764705882353 0.01666666667 2024:02:09 07:53:55. 2024:02:09 07:53:55. 107.8988345 2024:02:09 20:00:00Z 53.7379305555556 -2.00911388888889 0.00491140798741088 69.3903656740024 26 53.7379305555556 -2. 2.04314572276293 6.28095631354252 iPhone XS Max back d NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
media/photos/5.jpeg 13.5 5.jpeg media/photos 114105 2024:02:25 14:10:36+ 2026:02:18 17:07:01+ 2026:02:12 14:53:42+ 100644 JPEG JPG image/jpeg 1 1 251 13268 26261 38926 51 MM Apple iPhone XS Max 1 72 72 2 16.3.1 2024:02:09 07:59:26 iPhone XS Max 512 512 1 0.02 1.8 2 200 0232 2024:02:09 07:59:26 2024:02:09 07:59:26 1 2 3 0 0.0199939999476312 1.79999999993144 1.729386043 0 5 16 4.25 2013 1511 2217 1393 14 1 177392672294583 1000000000 0 0 190 166 1 -0.0090702204 -1.004 5.14453125 2.3046875 10 13639680 0 1 112 268435591 33.62224582 D001CCA7-8D00-4C1F-9 6777 1 54 889 889 0100 65535 4032 3024 2 1 0 0 26 0 4.25 6 1.8 2.4 Apple iPhone XS Max back d 2 N W 0 20:00:00 K 0.6883943678 T 314.2805789 T 314.2805789 2024:02:09 28.84374421 appl 1024 mntr RGB XYZ 2022:01:01 00:00:00 acsp APPL 0 APPL 0 0 0 0.9642 1 0.82491 appl 236 253 163 142 56 1 Display P3 Copyright Apple Inc. 0.96419 1 0.82489 0.51512 0.2412 -0.00 0.29198 0.69225 0.04 0.1571 0.06657 0.784 base64:cGFyYQAAAAAAA 1.04788 0.02292 -0.0 base64:cGFyYQAAAAAAA base64:cGFyYQAAAAAAA 375 500 0 8 3 2 2 177392.672294583 1.8 375 500 0.1875 6.11764705882353 0.02 2024:02:09 07:59:26. 2024:02:09 07:59:26. 109.6631615 2024:02:09 20:00:00Z 53.7378305555556 -2.00918888888889 0.00491140798741088 69.3903656740024 26 53.7378305555556 -2. 2.04314572276293 6.33985000288463 iPhone XS Max back d XMP Core 6.0.0 16.3.1 2024:02:09 07:59:26 0.68385714285714294 0.026190476190476208 0.53928571428571437 0.034571428571428586 normalized Face 0 90 80 2 3024 4032 pixel
media/photos/6.jpeg 13.5 6.jpeg media/photos 108813 2024:02:25 14:10:37+ 2026:02:18 17:06:59+ 2026:02:12 14:53:42+ 100644 JPEG JPG image/jpeg 1 1 251 13239 26702 40488 54 MM Apple iPhone XS Max 1 72 72 2 16.3.1 2024:02:09 08:04:39 iPhone XS Max 512 512 1 0.02040816327 1.8 2 200 0232 2024:02:09 08:04:39 2024:02:09 08:04:39 1 2 3 0 0.0205200000829189 1.79999999993144 1.789725209 0 5 16 4.25 2013 1511 2217 1393 14 1 177704726360250 1000000000 0 0 170 177 1 -0.08329442888 -0.80 1.33984375 3.8007812 10 13639680 0 1.568873048 60 268435526 33.80025099 25F23E88-98E6-4CC7-A 6560 1 67 020 020 0100 65535 4032 3024 2 1 0 0 26 0 4.25 6 1.8 2.4 Apple iPhone XS Max back d 2 N W 0 20:00:00 K 0 T 292.9943236 T 292.9943236 2024:02:09 55 appl 1024 mntr RGB XYZ 2022:01:01 00:00:00 acsp APPL 0 APPL 0 0 0 0.9642 1 0.82491 appl 236 253 163 142 56 1 Display P3 Copyright Apple Inc. 0.96419 1 0.82489 0.51512 0.2412 -0.00 0.29198 0.69225 0.04 0.1571 0.06657 0.784 base64:cGFyYQAAAAAAA 1.04788 0.02292 -0.0 base64:cGFyYQAAAAAAA base64:cGFyYQAAAAAAA 375 500 0 8 3 2 2 177704.72636025 1.8 375 500 0.1875 6.11764705882353 0.02040816327 2024:02:09 08:04:39. 2024:02:09 08:04:39. 109.6561203 2024:02:09 20:00:00Z 53.7378055555556 -2.00898055555556 0.00491140798741088 69.3903656740024 26 53.7378055555556 -2. 2.04314572276293 6.31070365689329 iPhone XS Max back d NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Figure 3.1: Note: To make this table easier to read online I’ve clipped each cell to show a maximum of 20 characters for the top 5 rows of the dataframe.

If you would like to download just a copy of this dataframe you can do this use this link

Direct Visulization

The first thing we can explore is using some of this EXIF data is to play around with direct visulization techniques, as we saw in the Lev Manovich article.

Since my photos are not (yet) connected to a bigger issue we are going to use the variable GPSHPositioningError and see how this changes across the pictures. We are going to start off simply and just use an if statement built into a loop. We are using the same loop we used to load in all our pictures above, and then plotting them with an opacity and a color, either blue or orange, laid over the top.

par(mfrow = c(5,10),
    mar = c(0,0,0,0))

for(n in 1:49){
  picture <- image_read(all_exif[n,]$SourceFile) # now we have the exif data we 
  # have changed this command use the 'SourceFile' variable from the dataframe
  
  gps_error <- all_exif[n,]$GPSHPositioningError # set the variable
  
  if (gps_error <= 500){ 
    plot(image_colorize(picture,
                        opacity = 50,
                        color = 'blue'))
    }
  
  if (gps_error >= 500){
    plot(image_colorize(picture,
                        opacity = 50,
                        color = 'orange'))
    }
}

We now have a direct data visulization! But or classification is quite simple. We can make this a little more interesting by using a color spectrum.

We can use the colorRampPalette function to achieve this. Our palette is going to run from blue to orange again. To set our color ramp we are going to set it against the max value (using the max() function). In this case it will create a color ramp with 11,000 colors! But when we plot them altogether it will give us a nice smooth gradient that looks like this:

We can then incorporate this into our loop (taking out the if statements) and setting the color of each photo by putting our color_ramp and GPSHPositioningError values together.

colfunc <- colorRampPalette(c('blue', 'orange'))
colour_ramp <- colfunc(max(all_exif$GPSHPositioningError))

par(mfrow = c(5,10),
    mar = c(0,0,0,0))

for(n in 1:49){

    picture_path <- paste0(all_exif[n,]$SourceFile)
  picture <- image_read(picture_path)
  
  gps_error <- all_exif[n,]$GPSHPositioningError
  
  plot(image_colorize(picture,
                      opacity = 50,
                      color = colour_ramp[gps_error]))
  }

Oh! this isn’t quite what we were expecting.

It looks like we have one extreme value that is throwing off our nice color ramp. (For the keen eyed or frequent travelers its Wolverhampton station).

We could do a quick bit of Exploratory Data Analysis to learn some more about the distribution of our values, for example with a box and whisker plot:

To overcome this issue we can classify our variable according to a set of intervals. We have looked at variable splits in other classes, and there are lots of other ways of doing this (I am a big fan of this example).

To do this we are going to use the classInt library and the cut() function and assign the results to a new variable in our dataframe we are calling error_cats.

library(classInt)

intervals <- classIntervals(all_exif$GPSHPositioningError,
                            n = 5)

all_exif$error_cats <- cut(all_exif$GPSHPositioningError, 
                           breaks = intervals$brks,
                           include.lowest = TRUE,
                           labels=c(1:5))

colfunc <- colorRampPalette(c('blue', 'orange'))
colour_ramp <- colfunc(5)

par(mfrow = c(5,10),
    mar = c(0,0,0,0))

for(n in 1:49){
  picture_path <- paste0(all_exif[n,]$SourceFile)
  picture <- image_read(picture_path)
  
  plot(image_colorize(picture,
                      opacity = 50,
                      color = colour_ramp[all_exif[n,]$error_cats])
       )
}

Finally, we could change the order in which we are plotting our data. We can use the order() function to do this. Since we are using a loop, however, we do not need to re-order our data, we can simply create vector that loops through our pictures in an order defined by the GPSHPositioningError rank.

# We can use the order function to do this for us 

gps_error_order <-rownames(all_exif[order(all_exif$GPSHPositioningError),])

par(mfrow = c(5,10),
    mar = c(0,0,0,0))

for(n in gps_error_order){ # changing the order
  picture_path <- paste0(all_exif[n,]$SourceFile)
  picture <- image_read(picture_path)
  
  plot(image_colorize(picture,
                      opacity = 50,
                      color = colour_ramp[all_exif[n,]$error_cats])
  )
}

GPS Data

Our EXIF metadata data also includes GPS data. In its simplest form, GPS data is just a set of coordinates, one x and one y this means we don’t even need a maps package to plot it. We can send these coordinates straight to the plot function to see what they look like.

par(mfrow = c(1,1))

plot(all_exif$GPSLongitude, all_exif$GPSLatitude,
       pch=19, cex=1, col="red")

While this is easy to plot, its not very decipherable. To make my journey a little easier to understand we can turn this into a line plot and include a sequence of numbers alongside.

colfunc <- colorRampPalette(c('blue', 'orange'))
colour_ramp <- colfunc(49)

plot(all_exif$GPSLongitude, all_exif$GPSLatitude,
      type = "l",
      col = 'grey',
      )
text(all_exif$GPSLongitude, all_exif$GPSLatitude,
     labels = row(all_exif)[,1],
     col = colour_ramp[row(all_exif)[,1]])

Still not great, but it makes a little more sense. Lets bring this altogether and plot our images and our graph side by side. This next section gets a little complicated in how we are setting out our canvas in R, so don’t be overwhelmed if it doesn’t make immediate sense! (This may be very helpful to come back to later depending on what software you want to use to approach this project).

We can use par(omd=) to set up a par window within our canvas we can then call par() again to set out our 5x10 picture grid.

Using the other side of the canvas is a little more complicated. We need to use par(omd=) to specify the other side of the canvas. Then set out the number of rows again using par(mfrow=), in this case we just want 1 plot. Finally we have to use par(mfg=) so that our canvas does not reset when we call plot again.

par(omd=c(0,0.5,0,1),
    mfrow = c(5,10),
    mar = c(0,0,0,0))

for(n in 1:49){
  picture_path <- paste0(all_exif[n,]$SourceFile)
  picture <- image_read(picture_path)
  plot(picture)
  text(200,250, # lets add in text to show the order of the pictures
       labels = n,
       cex= 3,
       col = colour_ramp[row(all_exif)[,1]]) # and add in the color data
}
 
par(omd=c(0.5,1,0,1),
    mfrow=c(1, 1),
    mfg=c(1, 1))

plot(all_exif$GPSLongitude, all_exif$GPSLatitude,
     pch=19, cex=1,
     col = colour_ramp[all_exif$error_cats],
     axes = F
)
lines(all_exif$GPSLongitude, all_exif$GPSLatitude,
      type = "l",
      col = 'grey',
)

text(all_exif$GPSLongitude, all_exif$GPSLatitude,
     labels = row(all_exif)[,1],
     col = colour_ramp[all_exif$error_cats],
     pos = 3,
     offset = 0.3)

So far we have worked with our spatial data by simply reading our GPS coordinates into R just as they are. However this doesn’t look quite right as the points are stretched out and our axis are scaled according the the minimum and maximum values in our data rather than the fixed grid of longitude and latitude. We could fix this by playing around with the scale of our axis. However, the easier way to address this issue is by treating our GPS metadata as spatial data.

To work with this data as proper ‘spatial data’ we need to read it with a library that treats space in a more complex way. To do this we are going to us the sf library.

To start with we are going to read in a shapefile of the UK local authorities. We can then use this as an example that we can match up to our photo data.

library(sf)
la_data <- st_read('media/Local_authorites/LAD_DEC_2023_UK_BUC.shp')
Reading layer `LAD_DEC_2023_UK_BUC' from data source 
  `/Users/u2272744/Dropbox/30 - 39 - Personal Teaching/32 - Data Visualization/32.IM942.2526 - Adv Viz/32.IM942.2526 - Adv Viz - Week 5 - Class/IMAGES_AS_DATA/media/local_authorites/LAD_DEC_2023_UK_BUC.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 361 features and 8 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -116.1928 ymin: 7054.1 xmax: 655653.8 ymax: 1220310
Projected CRS: OSGB36 / British National Grid
par(omd=c(0,1,0,1))
par(mfrow=c(1,1))

plot(la_data)

Figure 5.1: If you are curious about administrative boundaries, what these variable names mean, and how to work with spatial data in detail we will be exploring this further in project 3.

Now we can convert our EXIF data into a a spatial data format. The sf package provides easy tools to do this with. But first lets make a subset of this data as we don’t need all of the variables from the EXIF data file.

all_exif_sf <- all_exif[, c("SourceFile", "GPSLongitude", "GPSLatitude")]

# And then convert this into a 'spatial' data.frame

all_exif_sf <- st_as_sf(all_exif_sf,
                        coords=c("GPSLongitude", "GPSLatitude"),
                        crs=st_crs(4326)) # I will explain this number soon!

plot(all_exif_sf)

Notice how, now we have converted our data into a spatial dataframe, that the points are not as stretched as when we were plotting them as simple x, y points? This is because base R was adjusting the x axis to fit the min and max of our data. However, as a spatial data format, sf tells R not to do this as it would distort the map!

Now lets plot the two files together to make sure everything is matching up.

plot(la_data$geometry)
points(all_exif$GPSLongitude, all_exif$GPSLatitude, col = 'red')

Wait, this doesn’t look right. All our points are the red ciricles out in the ocean, and we can tell from looking at the photographs themselves that this was not where they were taken!

So what has gone wrong? In this case, there is nothing wrong with the data but there is something off with the format in which the data is being read. When working with geographic data all the underlying points are encoded based on ‘projections’ of the earth. This is how points from a 3D sphere of our planet are translated into the 2D plane we use for making maps.

In this case each dataset has a different projection, encoded as the ‘crs’ or ‘coordinate reference system’. The UK data is based on the ‘OSGB36 / British National Grid’ which is used for all official data in the UK. (so created so that when you look at a UK map all the longitude and latitude grid lines are straight rather than being slightly curved). Whereas our the data from my phone is encoded as by the standard global GPS crs known as WGS84.

We can align these two projections using st_transform. The number 4326 specifies the WGS84 system.

la_data <- st_transform(la_data, crs=4326)

plot(la_data$geometry)
points(all_exif$GPSLongitude, all_exif$GPSLatitude, col = 'red')

Figure 5.2: Notice how the shape of the UK subtlety changes and our points now line up!

With our spatial data now matching up correctly we bring the two datasets together by looking at where the data matches.Our GPS coordinates are not only good for plotting where our pictures were taken, but for linking up our photographs to other datasets. We can do this through an operation called points-in-polygons. Essentially we locate where our photos are in the shapes data or ‘polygons’ in the local authorities dataset we have just been ploting.

We can find where our points fit within these shapes using the st_intersections() function.

matching_points <- st_intersection(la_data, all_exif_sf)
LAD23CD LAD23NM LAD23NMW BNG_E BNG_N LONG LAT GlobalID SourceFile geometry
259 E08000033 Calderdale NA 402618 424895 -1.9618 53.7205 af25e83c-93ad-4a81-bff4-6d5568a7aa4d media/photos/1.jpeg POINT (-2.016256 53.74164)
259.1 E08000033 Calderdale NA 402618 424895 -1.9618 53.7205 af25e83c-93ad-4a81-bff4-6d5568a7aa4d media/photos/2.jpeg POINT (-2.014094 53.74051)
259.2 E08000033 Calderdale NA 402618 424895 -1.9618 53.7205 af25e83c-93ad-4a81-bff4-6d5568a7aa4d media/photos/3.jpeg POINT (-2.008831 53.73789)
259.3 E08000033 Calderdale NA 402618 424895 -1.9618 53.7205 af25e83c-93ad-4a81-bff4-6d5568a7aa4d media/photos/4.jpeg POINT (-2.009114 53.73793)
259.4 E08000033 Calderdale NA 402618 424895 -1.9618 53.7205 af25e83c-93ad-4a81-bff4-6d5568a7aa4d media/photos/5.jpeg POINT (-2.009189 53.73783)
259.5 E08000033 Calderdale NA 402618 424895 -1.9618 53.7205 af25e83c-93ad-4a81-bff4-6d5568a7aa4d media/photos/6.jpeg POINT (-2.008981 53.73781)

Using the the same loop as we did for the examples above we can now bring these two datasets together and write in the local authority names where each photograph was taken.

par(mfrow = c(5,10), # set our canvas to plot with 5 rows and 10 columns
    mar = c(0,0,0,0)) # set no margins around our plots

for(n in 1:49){
  picture_path <- paste0(all_exif[n,]$SourceFile)
  picture <- image_read(picture_path)
  plot(picture)
  text(200,250, # lets add in text to show the Local Authority
     labels = matching_points[n,]$LAD23NM, # labeling
    cex= 1,
   col = 'red'
  )
}

Time Data

One of the attributes attached to our photos is time. Time can be very useful for representing all sorts of things about our data! However, time needs to be stored in R as a specific type of value. This example walks through the very basics of working with time and showing how to convert strings into a time format.

We can check the type of how or data is stored with typeof() function.

typeof(all_exif$DateTimeOriginal)
[1] "character"

As a character field our time data is currently seen by R as string of letters and numbers while this can be of use for double checking things, it means we cannot plot, add, subtract or visualise this data.

We can convert this strings into date formats, known in R as DateTimeClasses with strptime() function. However, we need to specify what the function is looking for. For looking at our data we can see that dates are beings stored as:

"2024:02:09 11:58:45"

We can then use special characters, which we specify with %, to tell R what and where our times are:

  • %Y stands for year
  • %m for month
  • %d for day
  • %H for hour
  • %M for minutes
  • %S for seconds

There are many potential options here which you can find out with the ?strptime command.

all_exif$time_formated <- strptime(all_exif$DateTimeOriginal, "%Y:%m:%d %H:%M:%S")

Now that R can read time as DateTimeClasses we can do mathematical operations on time just as we would with numbers.

For example we can calculate my total journey time from the first picture to the last.

time_total <- max(all_exif$time_formated) - min(all_exif$time_formated)

time_total
Time difference of 4.344444 hours

As our time data is now in the right format, we can now visualize it. To create sample data I tried to take a picture every 5 minutes, lets see how successful I was …

The code below is an easy way to accomplish this, but it isn’t the ‘best’ or most ‘efficient’ code. Trying to write ‘perfect’ code can often gets in the way of what we trying to archive. The main question we should ask ourselves when we are coding is does this help us accomplish our task?

As with all code, there are multiple different ways we can achieve the same result. With the below code, I’ve tried to make things as easy to follow as possible by reusing the loop we used to read in the EXIF data and not load any external libraries for other functions that we could use to achieve the same result. For our purposes today a loop is sufficient and can be an an easy way to get the results we are after.

# lets create a blank data frame, as we loop through our data we will add each
# new row to the bottom of the data frame
time_data <- data.frame()

# we are also going to start from our second value 
for (n in 2:49){
  time_dif <- difftime(all_exif[n,]$time_formated,
                       all_exif[n - 1,]$time_formated, #and then to get the
                       units = 'secs') # difference we use n - 1 to select the 
                                       # previous value
  
  time_dif <- as.numeric(time_dif) # then have our time in seconds
  
  # then turn our results into a single row data.frame
  temp_time_data <- data.frame('SourceFile' = all_exif[n,]$SourceFile, 
                               'time_between' = time_dif)
                                
  # add add this to our time_data data.frame 
  time_data <- rbind(time_data, temp_time_data)
}

# and finally, see how far off I was from the 5 minute mark

time_data$five <- time_data$time_between - 300

# A simple plot

par(mfrow=c(1,1))

plot(time_data$five, type='h')

Color Data

So far we have worked with our pictures by extracting the EXIF data and overlaying information on top or plotting it separately. But the images themselves are made up of data. For this next example we are going to extract color data from our images and plot this on its own, giving us a more abstract representation of our pictures

To do this we are going to use the package RImagePalette. There are several packages that can do operations like this, but this one is fast and simple to use so works well for this task. Other packages, can work with extracting color in different more specific ways from images so are worth looking up if you are interested!

We are also going to load in our images in a different way using the JPEG library. As it represent our image as a matrix of its underlying data. This means that the RImagePalette can directly perform operations on the underlying data.

library(RImagePalette)
library(baselines) 

jpeg_example <- jpeg::readJPEG(paste0(all_exif[1,]$SourceFile))

We can then use the image_palette() function to pick out the nine most dominant colors from our example picture in a 3x3 grid.

col_pal <- image_palette(jpeg_example, n=9) 

par(mfrow=c(3,3),
    mar = c(0,0,0,0))

for (c in col_pal){
  plot_blank()
  plot_background(col=c)
}

Now lets scale this up to all of our pictures. I’m going to pick 5 colors and plot them sequentially so we need 5*49 cols. We are a also going to keep a running list of all the colors we generate during this loop for another visualization in the col_pal_all vector.

# I'm going to pick 5 colors and plot them sequentially so we need 5*49 cols
par(mfrow=c(1,245),
    mar = c(0,0,0,0))

# We are a also going to keep a running list of all the colors we generate
# during this loop for another visualization 

col_pal_all <- c()

for(n in 1:49){
  picture_path <- paste0(all_exif[n,]$SourceFile)
  picture <- jpeg::readJPEG(picture_path)
  col_pal <- image_palette(picture, n=5)
  for (c in col_pal){
    plot_blank()
    plot_background(col=c)
  }
  col_pal_all <- append(col_pal_all, col_pal)
}

With the all our colors in the single vector col_pal_all we can sort them into an order to make a different visulization. However, sorting colors is a non-trivial problem, which this blog post dives into. To avoid this complexity we are going to use the lterpalettefinder library which has a color sorter function we can use for now.

library(lterpalettefinder)

col_pal_sorted <- palette_sort(col_pal_all)

par(mfrow=c(1,245),
    mar = c(0,0,0,0))

for (c in col_pal_sorted){
  plot_blank()
  plot_background(col=c)
}

OCR

Finally, we can extract data of what is contained in the images themselves, for example all of the text. To do this we are going to use a form of machine learning called OCR, or Optical Character Recognition.

To do this we are going use the google tesseract library.

library(tesseract)

# setting the language
tesseract(language = "eng")

text_data <- data.frame()
 
# Unlike all our other loops for this task we need higher resolution images
# so we are loading in our pictures from a different directory
 
 for(n in 1:49){
   picture_path <- paste0('media/og_photos/', n, '.jpeg')
   picture <- image_read(picture_path)
   picture_text <- image_ocr_data(picture)
   picture_text$source <- all_exif[n,]$SourceFile
   text_data <- rbind(text_data, picture_text)
 }
Figure 8.1: Note this step requires higher resolution images which you need to download from the class Teams channel.

Looking at the text data that has been extracted from the text we can see that lots of it is unusable!

…1 word confidence bbox source
1 i 50.11905 2590,0,2613,89 media/photos/1.jpeg
2 ! 35.61607 2655,8,2676,75 media/photos/1.jpeg
3 “Vb 64.10910 2745,0,2814,41 media/photos/1.jpeg
4 J 95.34794 2892,0,2930,98 media/photos/1.jpeg
5 A 25.13509 2949,0,2995,43 media/photos/1.jpeg
6 oe 40.92468 2446,99,2500,116 media/photos/1.jpeg

To address this issue over this we are going to filter out everything with a confidence level below 90:

ninty_plus <- text_data[which(text_data$confidence > 90, any),]
…1 word confidence bbox source
4 J 95.34794 2892,0,2930,98 media/photos/1.jpeg
29 | 90.71518 1867,1123,1877,1260 media/photos/1.jpeg
83 | 90.63419 1234,1856,1237,1891 media/photos/1.jpeg
133 | 90.00334 654,2151,658,2174 media/photos/1.jpeg
134 | 90.47010 791,2088,798,2192 media/photos/1.jpeg
161 = 92.46490 2321,2254,2347,2258 media/photos/1.jpeg

This is looking better, but there is still lots of useless data so we are only going to keep ‘words’ with more than 1 character.

ninty_plus_words <- ninty_plus[which(nchar(ninty_plus$word) > 1, any),]
…1 word confidence bbox source
2274 SS 90.11664 1084,1280,1155,1323 media/photos/5.jpeg
2357 PASSENCERS 91.67908 938,1690,1108,1719 media/photos/5.jpeg
2358 MUST 96.32262 1124,1697,1193,1713 media/photos/5.jpeg
2359 NOT 96.32262 1209,1701,1255,1715 media/photos/5.jpeg
2373 CROSS 96.32278 898,1719,987,1751 media/photos/5.jpeg
2374 THE 96.39224 1005,1723,1057,1737 media/photos/5.jpeg

Now that we have our words we extracted from our images we can create a data visualization from them. As both a bit of EDA and an straightforward bit of code, I’m going to make a wordcloud.

library(wordcloud2)

# all we have to do is count up how many times each word occurs 

wordcloud_data <- as.data.frame(table(ninty_plus_words[,c("word")] ))

# voila! 

wordcloud2(wordcloud_data)

Class Challange!

I created this dataset without a motivation of linking the data I collected to a larger issue.However, we could think about some ways in which this could be made to link the personal and political. To trial this out I have linked the data we have just worked with to some external datasets.

As a challenge for this class I would like you to download this csv and see what data visulizations you can produce?

I have included 5 new variables linked via the location each picture was taken. These are:

  • Average house price.
  • Average time it takes to walk to a hospital.
  • Index of Multiple Deprivation by rank and decile.
  • Urban/rural classification.

Work with the data however you want!

You can download the .csv of all the EXIF data here or a copy of all the photographs here.

Please also feel free to merge in add, mash-up, or play around with any of the data!

As ever please ask us questions!