Alignment of Data to ArkeoGIS Using the R Software

This tutorial aims to serve as an example for automating the alignment of data from a database deposited on the Heurist platform, to the ArkeoGIS thesaurus using the free software R. It is necessary to know a minimum of this software to be able to follow this tutorial. This document is based on the database of a PhD thesis (Scholtus 2021).
Author

Lizzie Scholtus

Published

February 24, 2023


# Libraries used in this document:

library(kableExtra)
library(plyr)
library(dplyr)
library(stringr)
library(rlist)
library(tidyr)


1 Original data presentation


The dataset used here comes from a database hosted on the site Heurist which allows the creation of relational databases and the export of cross-referenced data in csv format. In this export, Table 1, we find information relating to the objects and their typologies, as well as to the sites on which they were discovered. In addition, some information, concerning more specifically the sites, has not been imported into this database, but is kept in Access format. These elements have been exported in csv format (Table 2) and can be linked to the objects table thanks to the site identifier.

All the elements required for ArkeoGIS are therefore already present in these two datasets, but not necessarily in the right columns or in the right format. These tables must therefore be assembled and cleaned up, then aligned with the site-specific thesaurus ArkeoGIS before being deposited.

Note

The data imported here come from a database in French and therefore are imported using ArkeoGIS french thesaurus. But it is possible to do this in all the languages proposed by ArkeoGIS.


# Import of both data sets into R

objets <- read.csv2("data/Objet.csv", sep = ",", encoding = "UTF-8")
sites <- read.csv2("data/Site.csv", sep = ";", encoding = "UTF-8")
Table 1: Export of the objects table from Heurist including related information
Objet.H.ID Unique.public.identifier catégorie.objet Materiaux Site.objet.H.ID Unique.public.identifier.1 Lieu.dit Commune.H.ID Name...Title Mappable.location..geospatial. Type.objet.H.ID rec_Title Datation.type.H.ID Name...Title.1 Bibliographie..Référence.biblio..H.ID rec_Title.1
42244 APP0001 harnais Alliage cuivreux 6929 SIT0092 Augusta Raurica 5906 Augst POINT(7.71468 47.53557) 7437 harnais - Artefacts APH-4000, Bishop 1988 7026|7027|7028 0 : 25 apr|25 : 50 apr|50 : 75 apr 201387|181260 Artefacts APH-4068|Deschler-Erb 1999, n° 386
42245 APP0010 harnais Alliage cuivreux 6545 SIT0575 Rheingönheim 5136 Ludwigshafen am Rhein POINT(8.41944 49.44514) 7437 harnais - Artefacts APH-4000, Bishop 1988 7026|7027|7028 0 : 25 apr|25 : 50 apr|50 : 75 apr 203143|201387|193274 Bishop 1992, fig. 15, n°1-2|Artefacts APH-4068|Ulbert 1969, Pl.28 n°4
42246 APP0100 harnais Alliage cuivreux 6918 SIT0091 Vindonissa 5895 Windisch an der Aare POINT(8.21842 47.47899) 7447 harnais - Artefacts APH-4000, Bishop 1988 7030|7031|7032|7033 100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr 193339|201165 Unz, Deschler-Erb 1997, pl. 63, n° 1812|Artefacts APH-4051


Table 2: Export of the sites table from Access
ID_SITE Centroide Etat.de.la.recherche Occupation Datation Bibliographie Activités.artisanales Type.de.site
SIT0001 VRAI Fouillé Non renseigné IVe s. av. - IVe s. apr. R. Cordie-Hachenberg, A. Haffner 1991, Haffner 1971 Funéraire
SIT0002 VRAI Non renseigné Non renseigné Habitat indéterminé
SIT0003 VRAI Non renseigné Non renseigné Indéterminé

2 Table cleaning


The first step in preparing the dataset is to clean the table by removing unnecessary columns. This operation is obviously different for each dataset. Here in particular, the unique identifiers used to link the different tables in Heurist, i.e. all the columns containing the information H.ID in their title need to be removed. It is possible to remove all columns ending with the letters ID with the help of a regular expression (REGEX).


# Suppress unused columns with a regular expression

n_objet <- as.data.frame(objets[,-grep(".*ID$", colnames(objets))])
Table 3: Table objects after removing non-useful columns
Unique.public.identifier catégorie.objet Materiaux Unique.public.identifier.1 Lieu.dit Name...Title Mappable.location..geospatial. rec_Title Name...Title.1 rec_Title.1
APP0001 harnais Alliage cuivreux SIT0092 Augusta Raurica Augst POINT(7.71468 47.53557) harnais - Artefacts APH-4000, Bishop 1988 0 : 25 apr|25 : 50 apr|50 : 75 apr Artefacts APH-4068|Deschler-Erb 1999, n° 386
APP0010 harnais Alliage cuivreux SIT0575 Rheingönheim Ludwigshafen am Rhein POINT(8.41944 49.44514) harnais - Artefacts APH-4000, Bishop 1988 0 : 25 apr|25 : 50 apr|50 : 75 apr Bishop 1992, fig. 15, n°1-2|Artefacts APH-4068|Ulbert 1969, Pl.28 n°4
APP0100 harnais Alliage cuivreux SIT0091 Vindonissa Windisch an der Aare POINT(8.21842 47.47899) harnais - Artefacts APH-4000, Bishop 1988 100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr Unz, Deschler-Erb 1997, pl. 63, n° 1812|Artefacts APH-4051


3 Transform the table to ArkeoGIS format


For the dataset to be compatible with the ArkeoGIS platform, it must be composed of the following 22 columns. A specific value and type is defined for each (see the online manual for more information). In the table used for this example, while some columns are already in the correct format, others will require some manipulations.

Tip

The data model is downloadable here


# Import of the ArkeoGIS data model

arkeoGIS <- read.csv2("data/Model.csv", sep = ";")
 [1] "SITE_SOURCE_ID"     "SITE_NAME"          "MAIN_CITY_NAME"    
 [4] "GEONAME_ID"         "PROJECTION_SYSTEM"  "LONGITUDE"         
 [7] "LATITUDE"           "ALTITUDE"           "CITY_CENTROID"     
[10] "OCCUPATION"         "STATE_OF_KNOWLEDGE" "STARTING_PERIOD"   
[13] "ENDING_PERIOD"      "CARAC_NAME"         "CARAC_LVL1"        
[16] "CARAC_LVL2"         "CARAC_LVL3"         "CARAC_LVL4"        
[19] "CARAC_EXP"          "BIBLIOGRAPHY"       "COMMENTS"          
[22] "WEB_IMAGES"        


3.1 Columns to be renamed


No changes are necessary for a number of columns. They just need to be rename according to the table provided in the ArkeoGIS manual.

  • SITE_SOURCE_ID: this column, designates the unique identifier referring to the site on which the object was discovered. Here, this information is found in the column Unique.public.identifier.1, with the column Unique.public.identifier indicating the unique identifier of the object itself.

  • SITE_NAME: the name of the site is in the column Lieu.dit.

  • MAIN_CITY_NAME: the municipality where the site is located is in the column Name…Title.

  • BIBLIOGRAPHY: the bibliography related to the object is in the column rec_Title.1.


# Rename existing columns

colnames(n_objet)[c(4:6, 10)] <- colnames(arkeoGIS[c(1:3, 20)])
Table 4: Table objects after renaming some of the columns
Unique.public.identifier catégorie.objet Materiaux SITE_SOURCE_ID SITE_NAME MAIN_CITY_NAME Mappable.location..geospatial. rec_Title Name...Title.1 BIBLIOGRAPHY
APP0001 harnais Alliage cuivreux SIT0092 Augusta Raurica Augst POINT(7.71468 47.53557) harnais - Artefacts APH-4000, Bishop 1988 0 : 25 apr|25 : 50 apr|50 : 75 apr Artefacts APH-4068|Deschler-Erb 1999, n° 386
APP0010 harnais Alliage cuivreux SIT0575 Rheingönheim Ludwigshafen am Rhein POINT(8.41944 49.44514) harnais - Artefacts APH-4000, Bishop 1988 0 : 25 apr|25 : 50 apr|50 : 75 apr Bishop 1992, fig. 15, n°1-2|Artefacts APH-4068|Ulbert 1969, Pl.28 n°4
APP0100 harnais Alliage cuivreux SIT0091 Vindonissa Windisch an der Aare POINT(8.21842 47.47899) harnais - Artefacts APH-4000, Bishop 1988 100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr Unz, Deschler-Erb 1997, pl. 63, n° 1812|Artefacts APH-4051


3.2 Columns to be created without manipulation


In order to match ArkeoGIS data structure, some columns and their data need to be added as they are not yet present in the dataset.

  • The column GEONAME_ID is used to retrieve sites coordinates for which information would not be available in the dataset. It is not the case here. This column should therefore be added, but without new data.
Tip

The column GEONAME_ID indicates the unique identifier of a location which can be found on the Geonames website. It is also possible to export and align locations directly to this repository (see Marlet et al. (2022)).

  • The column ALTITUDE contains information that is not available in this dataset.

  • The column PROJECTION_SYSTEM indicates the projection system in which the coordinates are given, using its EPSG code. In this case, it is WGS84 with the code 4326.

Tip

There is an EPSG code for each projection system. They can be found at Wikipedia or at epsg.

  • The column CARAC_NAME specifies in which part of ArkeoGIS thesaurus the described row refers (Real Estate, Furniture, Production, Landscape, Analysis). In the data used here, only furniture is found.

  • The column CARAC_EXP indicates whether the feature described is exceptional or not.

  • The column WEB_IMAGES indicates a link to a url allowing access to illustrations in jpg, png or gif format.


# Create the columns to be added and their contents

n_objet[, colnames(arkeoGIS[c(4,8)])] <- NA
n_objet[, colnames(arkeoGIS[5])] <- "4326"
n_objet[, colnames(arkeoGIS[14])] <- "Mobilier"
n_objet[, colnames(arkeoGIS[19])] <- "Non"
n_objet[, colnames(arkeoGIS[22])] <- NA
Table 5: Adding new columns to the table objects
Unique.public.identifier catégorie.objet Materiaux SITE_SOURCE_ID SITE_NAME MAIN_CITY_NAME Mappable.location..geospatial. rec_Title Name...Title.1 BIBLIOGRAPHY GEONAME_ID ALTITUDE PROJECTION_SYSTEM CARAC_NAME CARAC_EXP WEB_IMAGES
APP0001 harnais Alliage cuivreux SIT0092 Augusta Raurica Augst POINT(7.71468 47.53557) harnais - Artefacts APH-4000, Bishop 1988 0 : 25 apr|25 : 50 apr|50 : 75 apr Artefacts APH-4068|Deschler-Erb 1999, n° 386 NA NA 4326 Mobilier Non NA
APP0010 harnais Alliage cuivreux SIT0575 Rheingönheim Ludwigshafen am Rhein POINT(8.41944 49.44514) harnais - Artefacts APH-4000, Bishop 1988 0 : 25 apr|25 : 50 apr|50 : 75 apr Bishop 1992, fig. 15, n°1-2|Artefacts APH-4068|Ulbert 1969, Pl.28 n°4 NA NA 4326 Mobilier Non NA
APP0100 harnais Alliage cuivreux SIT0091 Vindonissa Windisch an der Aare POINT(8.21842 47.47899) harnais - Artefacts APH-4000, Bishop 1988 100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr Unz, Deschler-Erb 1997, pl. 63, n° 1812|Artefacts APH-4051 NA NA 4326 Mobilier Non NA


3.3 Columns according to another table


The sites table contains the information needed to complete the data in the ArkeoGIS CITY_CENTROID, OCCUPATION, STATE_OF_KNOWLEDGE columns, which are relative to the discovery site and not to the object itself. To do this, the objects table and the sites table must be merged according to the unique identifier of the sites, allowing to repeat the information for each object discovered on the same site. However, before the two datasets can be merged, the two columns containing the unique site identifier, making the link possible, must be named in the same way.


# Rename the SiteID column in the `sites` table

colnames(sites)[1] <- "SITE_SOURCE_ID"

# Merging datasets

n_objet <- droplevels(merge(sites[,1:4], n_objet, by = "SITE_SOURCE_ID"))

# Renaming columns for ArkeoGIS

colnames(n_objet)[2:4] <- colnames(arkeoGIS[c(9, 11, 10)])
Table 6: Add columns from table sites to table objects
SITE_SOURCE_ID CITY_CENTROID STATE_OF_KNOWLEDGE OCCUPATION Unique.public.identifier catégorie.objet Materiaux SITE_NAME MAIN_CITY_NAME Mappable.location..geospatial. rec_Title Name...Title.1 BIBLIOGRAPHY GEONAME_ID ALTITUDE PROJECTION_SYSTEM CARAC_NAME CARAC_EXP WEB_IMAGES
SIT0001 VRAI Fouillé Non renseigné FIB2747 fibule Alliage cuivreux Wederath-Belginum Wederath POINT(7.16444 49.86499) fibule - Gaspar 2007 type 37a|fibule - Feugère 1985 type 26b1|fibule - Riha 1994 type 5.17.3 25 : 50 apr|50 : 75 apr|75 : 100 apr|100 : 150 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr|50 : 75 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr Haffner 1971, 20-21, Taf. 14|Artefacts FIB-4518 NA NA 4326 Mobilier Non NA
SIT0001 VRAI Fouillé Non renseigné PER1594 perle Verre Wederath-Belginum Wederath POINT(7.16444 49.86499) perle - Riha 1990 type 11.1.1 25 : 50 apr|50 : 75 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr Cordie-Hackenberg, Haffner 1997, Taf. 417, j|Artefacts PRL-4001 NA NA 4326 Mobilier Non NA
SIT0001 VRAI Fouillé Non renseigné FIB2751 fibule Alliage cuivreux Wederath-Belginum Wederath POINT(7.16444 49.86499) fibule - Gaspar 2007 type 50|fibule - Feugère 1985 type 26c2|fibule - Riha 1994 type 7.16 50 : 75 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr|200 : 250 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr|50 : 75 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr Artefacts FIB-4726|Cordie-Hackenberg, Haffner 1997, pl. 618, k NA NA 4326 Mobilier Non NA


The three imported columns are defined by fixed predefined fields in the ArkeoGIS manual. It is therefore necessary to check if they correspond to these, and if necessary to modify them accordingly.

  • The CITY_CENTROID column works on an yes-no basis. However, the import into R has changed these values to TRUE-FALSE.
# Change the values of the CITY_CENTROID column

n_objet$CITY_CENTROID <- factor(n_objet$CITY_CENTROID)
levels(n_objet$CITY_CENTROID) <- c("Non", "Oui")

Table 7: Column content CITY_CENTROID

(a) Before
Var1 Freq
FAUX 11472
VRAI 18980
(b) After
Var1 Freq
Non 11472
Oui 18980


Note

The values in this column can be correlated with the use of geonames in the column GEONAME_ID, to define the coordinates of a site. In the example used here no geonames are used, however a number of sites are located by non-precise coordinates and this information is reported in this column.


  • The column STATE_OF_KNOWLEDGE only accepts a certain number of specific entries and does not take into account empty cells. It is therefore necessary to replace these with Not documented. It is also necessary to homogenise the entries for Foot Survey, which is written in three different ways. The use of a regular expression allows this change in a single line.


# Replace empty cells with "Not documented".

n_objet$STATE_OF_KNOWLEDGE[n_objet$STATE_OF_KNOWLEDGE == ""] <- "Non renseigné"

# Homogenise the term "Pedestrian Survey".

n_objet$STATE_OF_KNOWLEDGE <- sub("^Prospect.+$", "Prospecté pédestre", n_objet$STATE_OF_KNOWLEDGE)

Table 8: Content of column STATE_OF_KNOWLEDGE

(a) Before
Var1 Freq
27
Fouillé 7833
Littérature 104
Non renseigné 21661
Prospecté 91
Prospecté pédestre 511
Prospection pédestre 4
Sondé 221
(b) After
Var1 Freq
Fouillé 7833
Littérature 104
Non renseigné 21688
Prospecté pédestre 606
Sondé 221


  • The column OCCUPATION is also structured by a number of predefined entries. Here, it is only necessary to replace empty rows with documented.


# Replace empty cells with "Not documented".

n_objet$OCCUPATION[n_objet$OCCUPATION == ""] <- "Non renseigné"

Table 9: Content of column OCCUPATION

(a) Before
Var1 Freq
27
Continue 1332
Multiple 2
Non renseigné 27135
Unique 1956
(b) After
Var1 Freq
Continue 1332
Multiple 2
Non renseigné 27162
Unique 1956


3.4 Columns to be created from data present in other columns


For some columns, the information required for the import into ArkeoGIS is already present in the table, but is not in the right format or is mixed with other elements. It is therefore necessary to recover, modify and integrate it into the correct columns.


3.4.1 Coordinates


It is particularly the case here for the sites coordinates. This information is given in the column Mappable.location..geospatial. as geospatial data (Table 10 longitudes and latitudes are gathered in the same column). They should be separated and inserted in two different columns.

Table 10: Coordinates in the source data
x
POINT(7.16444 49.86499)
POINT(7.16444 49.86499)
POINT(7.16444 49.86499)
POINT(7.16444 49.86499)
POINT(7.16444 49.86499)
POINT(7.16444 49.86499)
# Create a table to store the coordinates
coord <- data.frame(LONGITUDE = numeric(), LATITUDE = numeric())

# Get the coordinates using a regular expression and store them in the `coord` table
coord <- strcapture("POINT\\((\\d+\\.\\d+)\\s(\\d+\\.\\d+)\\)", n_objet$Mappable.location..geospatial., coord)
summary(coord)
   LONGITUDE        LATITUDE    
 Min.   :4.920   Min.   :46.23  
 1st Qu.:6.173   1st Qu.:47.54  
 Median :6.956   Median :48.31  
 Mean   :6.966   Mean   :48.45  
 3rd Qu.:7.715   3rd Qu.:49.19  
 Max.   :9.028   Max.   :50.44  
 NA's   :8       NA's   :8      
# Gather the `objects` table and the `coord` table (This operation works because 
# the order of the rows has not been changed)
# Then delete the Mappable.location..geospatial column.
n_objet <- cbind(n_objet, coord)
n_objet <- n_objet[,-10]
Table 11: Add the correct coordinate columns to the objects table
SITE_SOURCE_ID CITY_CENTROID STATE_OF_KNOWLEDGE OCCUPATION Unique.public.identifier catégorie.objet Materiaux SITE_NAME MAIN_CITY_NAME rec_Title Name...Title.1 BIBLIOGRAPHY GEONAME_ID ALTITUDE PROJECTION_SYSTEM CARAC_NAME CARAC_EXP WEB_IMAGES LONGITUDE LATITUDE
SIT0001 Oui Fouillé Non renseigné FIB2747 fibule Alliage cuivreux Wederath-Belginum Wederath fibule - Gaspar 2007 type 37a|fibule - Feugère 1985 type 26b1|fibule - Riha 1994 type 5.17.3 25 : 50 apr|50 : 75 apr|75 : 100 apr|100 : 150 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr|50 : 75 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr Haffner 1971, 20-21, Taf. 14|Artefacts FIB-4518 NA NA 4326 Mobilier Non NA 7.16444 49.86499
SIT0001 Oui Fouillé Non renseigné PER1594 perle Verre Wederath-Belginum Wederath perle - Riha 1990 type 11.1.1 25 : 50 apr|50 : 75 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr Cordie-Hackenberg, Haffner 1997, Taf. 417, j|Artefacts PRL-4001 NA NA 4326 Mobilier Non NA 7.16444 49.86499
SIT0001 Oui Fouillé Non renseigné FIB2751 fibule Alliage cuivreux Wederath-Belginum Wederath fibule - Gaspar 2007 type 50|fibule - Feugère 1985 type 26c2|fibule - Riha 1994 type 7.16 50 : 75 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr|200 : 250 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr|50 : 75 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr Artefacts FIB-4726|Cordie-Hackenberg, Haffner 1997, pl. 618, k NA NA 4326 Mobilier Non NA 7.16444 49.86499


3.4.2 Dates


The dating information is found, in the objects table within a single column Name…Title.1 (Table 12), grouped by the phases of existence of the artefact. It is therefore necessary to separate them by retrieving the first phase for the column STARTING_PERIOD and the last phase for the column ENDING_PERIOD. However, some rows have only one existence period, representing both the start and end date, which requires additional manipulation. Also, as some rows are actually empty, it is not possible to simply replace the missing data from the first application of the strcapture function with that from the second.

Table 12: Dating in source data
x
25 : 50 apr|50 : 75 apr|75 : 100 apr|100 : 150 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr|50 : 75 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr
25 : 50 apr|50 : 75 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr
50 : 75 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr|200 : 250 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr|50 : 75 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr
25 : 50 apr|50 : 75 apr|75 : 100 apr|100 : 150 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr|50 : 75 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr
25 : 50 apr|50 : 75 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr
25 : 50 apr|50 : 75 apr|75 : 100 apr|100 : 150 apr|150 : 200 apr|200 : 250 apr|250 : 300 apr
# Get the start and end dates using a regular expression 
# and store them in separate columns

date <- data.frame(STARTING_PERIOD = character(), ENDING_PERIOD = character())
date <- strcapture(
  "^(\\d+\\s:\\s\\d+\\s\\w+)\\s?\\w+?\\|?.*\\|(\\d+\\s:\\s\\d+\\s\\w+)\\s?\\w+?$", 
  n_objet[,11], date)

# Retrieve unique dates

date2 <- data.frame(STARTING_PERIOD = character())
date2 <- strcapture("^(\\d+\\s:\\s\\d+\\s\\w+)\\s?\\w+?$", n_objet[,11], date2)

# Duplicate the column to have an end date

date2$ENDING_PERIOD <- date2$STARTING_PERIOD

# Add an identical unique identifier to both tables
# This works because missing data is indicated by NAs 
# and because the order of the data is maintained

date$ID <- paste0("ID", 1:nrow(date))
date2$ID <- paste0("ID", 1:nrow(date2))

# Join the two tables by replacing the missing data in the `date` table 
# with data from the `date2` table

date3 <- dplyr::left_join(date, date2, by = "ID") %>%
  ## This operation creates new columns STARTING_PERIOD and ENDING_PERIOD x and y
  
  # Replace the NAs with the data available in the y columns
  dplyr::mutate(STARTING_PERIOD = ifelse(is.na(STARTING_PERIOD.x), 
                                         STARTING_PERIOD.y, STARTING_PERIOD.x)) %>%
  dplyr::mutate(ENDING_PERIOD = ifelse(is.na(ENDING_PERIOD.x), 
                                       ENDING_PERIOD.y, ENDING_PERIOD.x)) %>%
  
  # Delete excess columns
  dplyr::select(-STARTING_PERIOD.y, -STARTING_PERIOD.x) %>%
  dplyr::select(-ENDING_PERIOD.y, -ENDING_PERIOD.x) %>%
  
  # And replace the remaining NA with "Undetermined"
  replace(is.na(.), "Indéterminé")

# Delete ID column

date3 <- date3[,-1]
Table 13: Results of the date transformation
STARTING_PERIOD ENDING_PERIOD
25 : 50 ap 250 : 300 ap
25 : 50 ap 250 : 300 ap
50 : 75 ap 150 : 200 ap
25 : 50 ap 250 : 300 ap
25 : 50 ap 250 : 300 ap
25 : 50 ap 250 : 300 ap


These dates should also be transformed into ArkeoGIS format by removing the apr or ap and replacing the av with a - symbol. Finally, missing data should be signaled with the Undetermined mention.

# Replace "av" or "apr" in the STARTING_PERIOD column

date3$STARTING_PERIOD <- gsub("(\\d+)\\s:\\s(\\d+)\\sap.*", "\\1:\\2", 
                              date3$STARTING_PERIOD)
date3$STARTING_PERIOD <- gsub("(\\d+)\\s:\\s(\\d+)\\sa.*", "-\\1:-\\2", 
                              date3$STARTING_PERIOD)

# Replace "av" or "apr" in the ENDING_PERIOD column

date3$ENDING_PERIOD <- gsub("(\\d+)\\s:\\s(\\d+)\\sap.*", "\\1:\\2", 
                            date3$ENDING_PERIOD)
date3$ENDING_PERIOD <- gsub("(\\d+)\\s:\\s(\\d+)\\sa.*", "-\\1:-\\2", 
                            date3$ENDING_PERIOD)

Table 14: The different dating possibilities after modification

(a) STARTING_PERIOD
Var1 Freq
-125:-75 1238
-150:-125 13505
-175:-150 10
-200:-175 283
-225:-200 205
-25:-0 1497
-260:-225 293
-50:-25 192
-75:-50 227
0:25 3871
100:150 646
150:200 656
200:250 1157
25:50 1859
250:300 1056
300:350 1674
350:400 781
50:75 1033
75:100 261
Indéterminé 8
(b) ENDING_PERIOD
Var1 Freq
-125:-75 524
-150:-125 236
-175:-150 365
-200:-175 5
-225:-200 37
-25:-0 274
-260:-225 10
-50:-25 13541
-75:-50 183
0:25 141
100:150 1475
150:200 916
200:250 975
25:50 398
250:300 2906
300:350 1438
350:400 3663
50:75 570
75:100 2787
Indéterminé 8


The dates are now ready to be integrated into the objects table. However, it is also possible to modify them again to match the chronology of continental Europe used in ArkeoGIS1. For C.E. dates, the dates in this work are based on the same chronology (Barral and Fichtl (2012)), but differ from ArkeoGIS by one year (see Table 15). For the AD dates, the system used in the source data corresponds to the Gregorian calendar, but can partly be linked to the ArkeoGIS chronology by also changing the one-year boundaries.

Table 15: Extract from the chronology used in ArkeoGIS for continental Europe
IdArkeogis PERIOD_NAME_LVL1 START_LVL1 STOP_LVL1 PERIOD_NAME_LVL2 START_LVL2 STOP_LVL2 PERIOD_NAME_LVL3 START_LVL3 STOP_LVL3 PERIOD_NAME_LVL4 START_LVL4 STOP_LVL4 IdArkPeriodo IdArkPactols X
38 2160 NA NA La Tène A -480 -381 NA NA NA NA http://n2t.net/ark:/99152/p09hq4ntk3v https://ark.frantiq.fr/ark:/26678/pcrtbm5GVtzrbP
39 2161 NA NA NA NA La Tène A1 -480 -421 NA NA http://n2t.net/ark:/99152/p09hq4nndw9
40 2162 NA NA NA NA NA NA La Tène A1a -480 -451 http://n2t.net/ark:/99152/p09hq4n7gkg
41 2163 NA NA NA NA NA NA La Tène A1b -450 -421 http://n2t.net/ark:/99152/p09hq4n43g3
42 2164 NA NA NA NA La Tène A2 -420 -381 NA NA http://n2t.net/ark:/99152/p09hq4n54cn
43 2165 NA NA La Tène B -380 -261 NA NA NA NA http://n2t.net/ark:/99152/p09hq4nj4h9 https://ark.frantiq.fr/ark:/26678/pcrtpz5a27rtgG
Tip

The different chronologies used in ArkeoGIS are listed here. Those with a DOI can be downloaded by following the DOI link or by going directly to the repository on Nakala.


# Modify dates to match the timeline used in ArkeoGIS

for (i in 1:ncol(date3)) {
  
  date3[,i] <- as.factor(date3[,i])
  
  levels(date3[,i]) <- c("-125:-76", "-150:-126", "-175:-151", "-200:-176", "-225:-201", 
                         "-25:-1", "-260:-226", "-50:-26", "-75:-51", "1:25", "101:150", 
                         "151:200", "201:250", "26:50", "251:300", "301:350", "351:400", 
                         "51:75", "76:100", "Indéterminé")
}

# Gather the `objects` table and the `date3` table 
# (this works because the rank order has not been changed)
# Then delete the column Name...Title.1

n_objet <- cbind(n_objet, date3)
n_objet <- n_objet[,-11]

Table 16: Modify and add dates to the dataset

(a) Extract from the dating columns
STARTING_PERIOD ENDING_PERIOD
15 -75:-51 -50:-26
16 -75:-51 -50:-26
17 -125:-76 -75:-51
18 -75:-51 -50:-26
19 26:50 251:300
20 -25:-1 -25:-1
(b) Adding columns to the object table
SITE_SOURCE_ID CITY_CENTROID STATE_OF_KNOWLEDGE OCCUPATION Unique.public.identifier catégorie.objet Materiaux SITE_NAME MAIN_CITY_NAME rec_Title BIBLIOGRAPHY GEONAME_ID ALTITUDE PROJECTION_SYSTEM CARAC_NAME CARAC_EXP WEB_IMAGES LONGITUDE LATITUDE STARTING_PERIOD ENDING_PERIOD
SIT0001 Oui Fouillé Non renseigné FIB2747 fibule Alliage cuivreux Wederath-Belginum Wederath fibule - Gaspar 2007 type 37a|fibule - Feugère 1985 type 26b1|fibule - Riha 1994 type 5.17.3 Haffner 1971, 20-21, Taf. 14|Artefacts FIB-4518 NA NA 4326 Mobilier Non NA 7.16444 49.86499 26:50 251:300
SIT0001 Oui Fouillé Non renseigné PER1594 perle Verre Wederath-Belginum Wederath perle - Riha 1990 type 11.1.1 Cordie-Hackenberg, Haffner 1997, Taf. 417, j|Artefacts PRL-4001 NA NA 4326 Mobilier Non NA 7.16444 49.86499 26:50 251:300
SIT0001 Oui Fouillé Non renseigné FIB2751 fibule Alliage cuivreux Wederath-Belginum Wederath fibule - Gaspar 2007 type 50|fibule - Feugère 1985 type 26c2|fibule - Riha 1994 type 7.16 Artefacts FIB-4726|Cordie-Hackenberg, Haffner 1997, pl. 618, k NA NA 4326 Mobilier Non NA 7.16444 49.86499 51:75 151:200


3.4.3 Comments


The comment column does not yet exist in the objects table. It is created according to the information contained in other columns and in particular catégorie.objet, Materiaux, rec_Title. Moreover, in this last column, which corresponds to the typology of the objects, certain information is redundant and it is necessary to clean it up before being able to concatenate everything.

# Clean up the `rec_Title` column

n_objet[,10] <- gsub("\\w+\\s-\\s", "", n_objet[,10])
n_objet[,10] <- gsub("\\|", ", ", n_objet[,10])

Table 17: Content of the rec_Title column

(a) Before
x
fibule - Gaspar 2007 type 37a|fibule - Feugère 1985 type 26b1|fibule - Riha 1994 type 5.17.3
perle - Riha 1990 type 11.1.1
fibule - Gaspar 2007 type 50|fibule - Feugère 1985 type 26c2|fibule - Riha 1994 type 7.16
fibule - Gaspar 2007 type 37a|fibule - Feugère 1985 type 26b1|fibule - Riha 1994 type 5.17.3
perle - Riha 1990 type 11.1.1
perle - Riha 1990 type 11.1.1
(b) After
x
Gaspar 2007 type 37a, Feugère 1985 type 26b1, Riha 1994 type 5.17.3
Riha 1990 type 11.1.1
Gaspar 2007 type 50, Feugère 1985 type 26c2, Riha 1994 type 7.16
Gaspar 2007 type 37a, Feugère 1985 type 26b1, Riha 1994 type 5.17.3
Riha 1990 type 11.1.1
Riha 1990 type 11.1.1


# Assemble the different columns into a new COMMENTS column 
# specifying the function of each element

n_objet$COMMENTS <- paste0("OBJET : ", n_objet$catégorie.objet, 
                           ". MATERIAU : ", n_objet$Materiaux, 
                           ". TYPO : ", n_objet$rec_Title, ".")
Table 18: Contents of the COMMENTS column
x
OBJET : fibule. MATERIAU : Alliage cuivreux. TYPO : Gaspar 2007 type 37a, Feugère 1985 type 26b1, Riha 1994 type 5.17.3.
OBJET : perle. MATERIAU : Verre. TYPO : Riha 1990 type 11.1.1.
OBJET : fibule. MATERIAU : Alliage cuivreux. TYPO : Gaspar 2007 type 50, Feugère 1985 type 26c2, Riha 1994 type 7.16.
OBJET : fibule. MATERIAU : Alliage cuivreux. TYPO : Gaspar 2007 type 37a, Feugère 1985 type 26b1, Riha 1994 type 5.17.3.
OBJET : perle. MATERIAU : Verre. TYPO : Riha 1990 type 11.1.1.
OBJET : perle. MATERIAU : Verre. TYPO : Riha 1990 type 11.1.1.


The columns used to create the comments are not deleted immediately, since they will be used to create the characterisation of the objects. Only the column containing the typology will no longer be useful.

n_objet <- n_objet[,-10]
Table 19: Add the column COMMENT to the object table
SITE_SOURCE_ID CITY_CENTROID STATE_OF_KNOWLEDGE OCCUPATION Unique.public.identifier catégorie.objet Materiaux SITE_NAME MAIN_CITY_NAME BIBLIOGRAPHY GEONAME_ID ALTITUDE PROJECTION_SYSTEM CARAC_NAME CARAC_EXP WEB_IMAGES LONGITUDE LATITUDE STARTING_PERIOD ENDING_PERIOD COMMENTS
SIT0001 Oui Fouillé Non renseigné FIB2747 fibule Alliage cuivreux Wederath-Belginum Wederath Haffner 1971, 20-21, Taf. 14|Artefacts FIB-4518 NA NA 4326 Mobilier Non NA 7.16444 49.86499 26:50 251:300 OBJET : fibule. MATERIAU : Alliage cuivreux. TYPO : Gaspar 2007 type 37a, Feugère 1985 type 26b1, Riha 1994 type 5.17.3.
SIT0001 Oui Fouillé Non renseigné PER1594 perle Verre Wederath-Belginum Wederath Cordie-Hackenberg, Haffner 1997, Taf. 417, j|Artefacts PRL-4001 NA NA 4326 Mobilier Non NA 7.16444 49.86499 26:50 251:300 OBJET : perle. MATERIAU : Verre. TYPO : Riha 1990 type 11.1.1.
SIT0001 Oui Fouillé Non renseigné FIB2751 fibule Alliage cuivreux Wederath-Belginum Wederath Artefacts FIB-4726|Cordie-Hackenberg, Haffner 1997, pl. 618, k NA NA 4326 Mobilier Non NA 7.16444 49.86499 51:75 151:200 OBJET : fibule. MATERIAU : Alliage cuivreux. TYPO : Gaspar 2007 type 50, Feugère 1985 type 26c2, Riha 1994 type 7.16.


3.4.4 Bilbiography


The column BIBLIOGRAPHY has already been renamed in a previous step (see Section 3.1). However, it needs a slight formatting and the addition of the reference to the source database.

# Clean the BIBLIOGRAPHY column

n_objet[,10] <- gsub("\\|", ", ", n_objet[,10])

# Add the reference of the source database

n_objet[,10] <- paste0("Scholtus 2021, ", n_objet$Unique.public.identifier, ", ", n_objet[,10])

Table 20: Formatting bibliographic information

(a) Before
x
Haffner 1971, 20-21, Taf. 14|Artefacts FIB-4518
Cordie-Hackenberg, Haffner 1997, Taf. 417, j|Artefacts PRL-4001
Artefacts FIB-4726|Cordie-Hackenberg, Haffner 1997, pl. 618, k
Haffner 1971, 20-21, Taf. 14|Artefacts FIB-4518
Cordie-Hackenberg, Haffner 1997, Taf. 532|Artefacts PRL-4001
Artefacts PRL-4001
(b) After
x
Scholtus 2021, FIB2747, Haffner 1971, 20-21, Taf. 14, Artefacts FIB-4518
Scholtus 2021, PER1594, Cordie-Hackenberg, Haffner 1997, Taf. 417, j, Artefacts PRL-4001
Scholtus 2021, FIB2751, Artefacts FIB-4726, Cordie-Hackenberg, Haffner 1997, pl. 618, k
Scholtus 2021, FIB2748, Haffner 1971, 20-21, Taf. 14, Artefacts FIB-4518
Scholtus 2021, PER1600, Cordie-Hackenberg, Haffner 1997, Taf. 532, Artefacts PRL-4001
Scholtus 2021, PER1597, Artefacts PRL-4001


3.5 Create object description columns


In the ArkeoGIS structure, furniture is defined by a strict thesaurus system with four levels of precision (Table 21). This thesaurus is available in the online manual or can be download here.

For furniture, which corresponds to all the data in this database, the levels specify first the material of the object, then its category and finally its identification. Thus, it is possible to use the columns catégorie.objet and Materiaux to correlate the two tables. However, it is necessary to do some more data cleaning for this to work.

mobilier <- read.csv2("data/Mobilier.csv", sep = ";", encoding = "UTF-8")
Table 21: Extract from the thesaurus for describing furniture in ArkeoGIS
IDArkeoGIS CARAC_NAME CARAC_LVL1 CARAC_LVL2 CARAC_LVL3 CARAC_LVL4 IdArk IdPactols IdAat
208 Mobilier Os http://ark.frantiq.fr/ark:/26678/pcrtpy5s63Sdhx 15222
209 Mobilier Os Ossements humains http://ark.frantiq.fr/ark:/26678/pcrtf0jhaoDXkJ 16033
210 Mobilier Os Ossements humains Tabletterie https://ark.frantiq.fr/ark:/26678/pcrtEaFZFvpInN 17206
211 Mobilier Os Ossements humains Outils https://ark.frantiq.fr/ark:/26678/pcrtxs8Kml8jLw 16050 http://vocab.getty.edu/page/aat/300024841
212 Mobilier Os Ossements humains Parure https://ark.frantiq.fr/ark:/26678/pcrtNBrnOdttU9 16131 http://vocab.getty.edu/page/aat/300261068
213 Mobilier Os Ossements humains Autres NA


3.5.1 CARAC_LVL1


The first step is to modify the column Materiaux to match the existing entries in the column CARAC_LVL1

# Create the column CARAC_LVL1 according to the materials

n_objet[colnames(arkeoGIS[15])] <- n_objet$Materiaux

# Modify the data to match the thesaurus

n_objet[,22] <- gsub("Alliage.*|Argent|Fer|Laiton|Bronze|Electrum|Plomb|Or|Potin", 
                     "Métal", n_objet[,22])
n_objet[,22] <- gsub("Agate|Ambre|Pierre|Cornaline|Cristal de roche|Jais", 
                     "Lithique", n_objet[,22])

## To remove the multiple materials and keep only the first
n_objet[,22] <- gsub("^(\\w+)\\|.*", "\\1", n_objet[,22])

n_objet[,22] <- gsub("Bois de cerf|Ivoire", "Os", n_objet[,22])

Table 22: Content of column Materiaux

(a) Before
Var1 Freq
5264
Agate 24
Alliage au plomb 1
Alliage cuivreux 7434
Alliage cuivreux|Fer 1
Alliage cuivreux|Or 2
Alliage cuivreux|Os 3
Alliage cuivreux|Pierre 2
Alliage cuivreux|Sardonyx 1
Alliage cuivreux|Verre 3
Ambre 9
Argent 4202
Argent|Pierre 1
Bois de cerf 26
Bronze 3977
Céramique 388
Cornaline 2
Cristal de roche 1
Electrum 1
Fer 161
Fer|Pierre 2
Fer|Verre 2
Inconnu 3
Ivoire 6
Jais 3
Laiton 2
Métal 3
Or 685
Os 1624
Os|Bronze|Fer 1
Plomb 2
Potin 3864
Verre 2752
(b) After
Var1 Freq
5264
Céramique 388
Inconnu 3
Lithique 39
Métal 20349
Os 1657
Verre 2752


However, the material is missing for quite a number of objects. Since this is the first level of ArkeoGIS, it is not possible to ignore this element and it is therefore necessary to try to recover this information.

Table 23: Categories of objects for which the material is not specified
Var1 Freq
bague 1
bracelet 25
Épée 166
épingle 1
fibule 11
monnaie 4860
perle 200
# Assigning material according to object category, when it is missing

n_objet$CARAC_LVL1[n_objet$Materiaux == "" & n_objet$catégorie.objet == "Épée"] <- "Métal"
n_objet$CARAC_LVL1[n_objet$Materiaux == "" & n_objet$catégorie.objet == "monnaie"] <- "Métal"
n_objet$CARAC_LVL1[n_objet$Materiaux == "" & n_objet$catégorie.objet == "fibule"] <- "Métal"
n_objet$CARAC_LVL1[n_objet$Materiaux == "" & n_objet$catégorie.objet == "épingle"] <- "Métal"

n_objet$CARAC_LVL1[n_objet$Materiaux == "Inconnu" & n_objet$catégorie.objet == "fibule"] <- "Métal"

n_objet$CARAC_LVL1[n_objet$Materiaux == "" & n_objet$catégorie.objet == "bague"] <- "Verre"
n_objet$CARAC_LVL1[n_objet$Materiaux == "" & n_objet$catégorie.objet == "perle"] <- "Verre"
n_objet$CARAC_LVL1[n_objet$Materiaux == "" & n_objet$catégorie.objet == "bracelet"] <- "Verre"
Table 24: Content of column Materiaux
Var1 Freq
Céramique 388
Lithique 39
Métal 25390
Os 1657
Verre 2978


3.5.2 CARAC_LVL2, 3 et 4


Since we have now correctly defined the column CARAC_LVL1, it is possible to use this and the column Unique.public.identifier to complete levels 2, 3 and 4 of characterisation. To do this, the Table 25 allows us to retrieve the different associations of categories and materials and then search them in the ArkeoGIS thesaurus to import them into the corresponding rows.

However, not all associations exist in the thesaurus. For example, while it is possible to find a characterisation describing metal rings, bone rings will be entered under the term Ornament. For this reason, it is necessary to proceed in several steps to retrieve the lines of the thesaurus corresponding to the different possible associations. It will then be possible to use the merge function to collate the data sets.

Table 25: Distribution of object categories according to materials (CARAC_LVL1)
Céramique Lithique Métal Os Verre
bague 0 2 991 3 24
boucle d’oreille 0 0 41 0 0
bouton 0 0 121 32 0
bracelet 0 0 335 0 1488
ceinture 0 0 781 27 0
Épée 0 0 218 32 0
épingle 0 3 201 1512 0
fibule 0 0 4851 0 0
harnais 0 0 784 1 0
monnaie 0 0 16980 0 0
pendentif 0 2 47 41 2
perle 388 32 40 9 1464


3.5.2.1 For existing associations


One of the difficulties is that the category of the object is not always entered at the same level of characterisation within the thesaurus. Another problem is the case difference between the thesaurus, in which every term starts with a capital letter, and the source data, in which there is not always a capital letter. Also, the category Coin, which is singular in the source data, is plural in ArkeoGIS. The terms Harness and Pendant need to be changed to synonyms to match the thesaurus perfectly. Finally, the entry Earring is problematic in the source data because of the shape of the apostrophe used in French.

The code below, and the subsequent use of a for loop, allows this category to be searched across all the columns for possible materials and the various possible thesaurus rows to be stored in a new object.


# Add a capital letter to the beginning of each category term in the source data

n_objet$catégorie.objet <- stringr::str_to_sentence(n_objet$catégorie.objet)

# Replace problematic terms

n_objet$catégorie.objet[n_objet$catégorie.objet == "Monnaie"] <- "Monnaies"
n_objet$catégorie.objet[n_objet$catégorie.objet == "Harnais"] <- "Harnachement"
n_objet$catégorie.objet[n_objet$catégorie.objet == "Pendentif"] <- "Pendeloque"
n_objet$catégorie.objet <- gsub("Boucle d.*", "Boucle d'oreille", n_objet$catégorie.objet)

# Retrieve unique occurrences of each category

catego <- unique(n_objet$catégorie.objet)

Table 26: Content of column catégorie.objet

(a) Before
Var1 Freq
bague 1020
boucle d’oreille 41
bouton 153
bracelet 1823
ceinture 808
Épée 250
épingle 1716
fibule 4851
harnais 785
monnaie 16980
pendentif 92
perle 1933
(b) After
x
Fibule
Perle
Épée
Harnachement
Bracelet
Pendeloque
Monnaies
Ceinture
Bouton
Bague
Boucle d'oreille
Épingle
# Extract columns from the thesaurus
## According to the material (materio)
## by object category (catego)
## by searching for it in all the columns

materio <- unique(n_objet$CARAC_LVL1)
mobilier2 <- mobilier[,3:6]

IDS <- list()

for (m in 1:length(materio)) {
  IDS[[m]] <- list()
  
  for (c in 1:length(catego)) {
  
  IDS[[m]][[c]] <- list()
  
    for (i in 1:ncol(mobilier2)) {
      
      IDS[[m]][[c]][[i]] <- mobilier2[mobilier2$CARAC_LVL1 == materio[[m]] 
                                      & mobilier2[,i] == catego[[c]],]
      names(IDS[[m]])[[c]] <- catego[[c]]
      IDS[[m]][[c]] <- rlist::list.clean(IDS[[m]][[c]], 
                                         function(x) length(x) == 0L, TRUE)
    }
    
    IDS[[m]] <- rlist::list.clean(IDS[[m]], function(x) length(x) == 0L, TRUE)
  }
  IDS <- rlist::list.clean(IDS, function(x) length(x) == 0L, TRUE)
}

# Extract the data from the lists into a table
# The column `catégorie.objet` is created to serve as a join key for the rest of the table

for (i in 1:length(IDS)) {
  for (t in 1:length(IDS[[i]])){
    for (v in 1:length(IDS[[i]][[t]])) {
      IDS[[i]][[t]][[v]] <- cbind(as.data.frame(IDS[[i]][[t]][[v]]), 
                                  catégorie.objet = names(IDS[[i]])[[t]])
    }
    IDS[[i]][[t]] <- as.data.frame(IDS[[i]][[t]])
  }
  IDS[[i]] <- Reduce(rbind, IDS[[i]])
}

# Gathering the different associations into a single dataset

IDS <- Reduce(rbind, IDS)
Table 27: Extraction of the characterisation columns and the unique identifier according to the different possible associations
CARAC_LVL1 CARAC_LVL2 CARAC_LVL3 CARAC_LVL4 catégorie.objet
Métal Parure Fibule Fibule
Métal Armement Épée Épée
Métal Harnachement Harnachement
Métal Harnachement Mors Harnachement
Métal Harnachement Passe-guides Harnachement
Métal Harnachement Hipposandale Harnachement
Métal Harnachement Char Harnachement
Métal Parure Parure annulaire Bracelet Bracelet
Métal Parure Pendeloque Pendeloque
Métal Monnaies Monnaies
Métal Monnaies Or Monnaies
Métal Monnaies Argent Monnaies
Métal Monnaies Bronze Monnaies
Métal Monnaies Potin Monnaies
Métal Monnaies Electrum Monnaies
Métal Monnaies Monnaie fourrée Monnaies
Métal Monnaies Fausse monnaie Monnaies
Métal Monnaies Autres Monnaies
Métal Parure Ceinture Ceinture
Métal Parure Parure annulaire Bague Bague
Métal Parure Boucle d'oreille Boucle d'oreille
Métal Parure Épingle Épingle
Verre Parure Perle Perle
Verre Parure Bracelet Bracelet
Lithique Parure Perle Perle


We notice on the Table 27 that for the categories Harness and Coins several entries are possible. While for coins we can use the column Materiaux to specify the source dataset coins, for the harness items it will not be possible to be more precise and the extra rows must be removed.

# Delete rows with levels of precision too high for the "Harness" category

IDS[IDS == ""] <- NA
IDS <- IDS[IDS$CARAC_LVL2 == "Harnachement" & is.na(IDS$CARAC_LVL3) == TRUE 
           | IDS$CARAC_LVL2 != "Harnachement",]
Table 28: Removal of excessive levels of detail
CARAC_LVL1 CARAC_LVL2 CARAC_LVL3 CARAC_LVL4 catégorie.objet
1 Métal Parure Fibule NA Fibule
2 Métal Armement Épée NA Épée
3 Métal Harnachement NA NA Harnachement
8 Métal Parure Parure annulaire Bracelet Bracelet
9 Métal Parure Pendeloque NA Pendeloque
10 Métal Monnaies NA NA Monnaies
11 Métal Monnaies Or NA Monnaies
12 Métal Monnaies Argent NA Monnaies
13 Métal Monnaies Bronze NA Monnaies
14 Métal Monnaies Potin NA Monnaies
15 Métal Monnaies Electrum NA Monnaies
16 Métal Monnaies Monnaie fourrée NA Monnaies
17 Métal Monnaies Fausse monnaie NA Monnaies
18 Métal Monnaies Autres NA Monnaies
19 Métal Parure Ceinture NA Ceinture
20 Métal Parure Parure annulaire Bague Bague
21 Métal Parure Boucle d'oreille NA Boucle d'oreille
22 Métal Parure Épingle NA Épingle
23 Verre Parure Perle NA Perle
24 Verre Parure Bracelet NA Bracelet
25 Lithique Parure Perle NA Perle


3.5.2.2 For non-existent associations


As mentioned above, some object categories are not directly represented in the ArkeoGIS thesaurus. For these, it is necessary to reduce the level of precision. Again, we can referred to the Table 25 table can be referred to check which associations are missing. It appears that these are essentially elements of adornment made of materials other than metal, as well as a few elements made of bone, which can be entered as Others.

# Recovering the different associations of categories and materials

categorie <- as.data.frame(table(n_objet$catégorie.objet, n_objet$CARAC_LVL1))
categorie <- categorie[categorie$Freq != 0, 1:2]

# Indicate a lower level of description (Adornment and Others)
categorie <- transform(
  categorie, Niv0 = ifelse(Var1 == "Fibule" | Var1 == "Perle" | Var1 == "Bague" | Var1 == "Boucle d'oreille" |
                        Var1 == "Bouton" | Var1 == "Bracelet" | Var1 == "Ceinture" | Var1 == "Épingle" | Var1 == "Pendeloque",
                        "Parure", "Autres")
)

# Delete previously found associations

colnames(categorie)[1:2] <- colnames(IDS[c(5,1)])
categorie <- droplevels(dplyr::anti_join(categorie, IDS, by = c("catégorie.objet", "CARAC_LVL1") ))
Table 29: Linking categories, at lower levels of detail
catégorie.objet CARAC_LVL1 Niv0
Perle Céramique Parure
Bague Lithique Parure
Épingle Lithique Parure
Pendeloque Lithique Parure
Bouton Métal Parure
Perle Métal Parure
Bague Os Parure
Bouton Os Parure
Ceinture Os Parure
Épée Os Autres
Épingle Os Parure
Harnachement Os Autres
Pendeloque Os Parure
Perle Os Parure
Bague Verre Parure
Pendeloque Verre Parure


It is now possible to use the categorie table with the same code as before, adapted to retrieve the corresponding rows from the thesaurus.

materio2 <- unique(categorie$CARAC_LVL1)
catego2 <- as.character(unique(categorie$catégorie.objet))
niv <- unique(categorie$Niv0)

IDS2 <- list()

for (m in 1:length(materio2)) {
  IDS2[[m]] <- list()
  
  for (c in 1:length(catego2)) {
    
    IDS2[[m]][[c]] <- list()
    
    for (n in 1:length(niv)) {
  
      IDS2[[m]][[c]][[n]] <- list()
  
      for (i in 1:ncol(mobilier2)) {
      
      
        IDS2[[m]][[c]][[n]][[i]] <- mobilier2[mobilier2$CARAC_LVL1 == materio2[[m]] 
                                              & mobilier2[,i] == niv[[n]],]
        names(IDS2[[m]])[[c]] <- catego2[[c]]
        names(IDS2[[m]][[c]])[[n]] <- niv[[n]]
        IDS2[[m]][[c]][[n]] <- rlist::list.clean(IDS2[[m]][[c]][[n]], 
                                                 function(x) length(x) == 0L, TRUE)
      }
    
      IDS2[[m]][[c]] <- rlist::list.clean(IDS2[[m]][[c]], function(x) length(x) == 0L, TRUE)
    }
    IDS2[[m]] <- rlist::list.clean(IDS2[[m]], function(x) length(x) == 0L, TRUE)
  }
  IDS2 <- rlist::list.clean(IDS2, function(x) length(x) == 0L, TRUE)
}

# Extract the data from the lists into a table
# The column `categorie.objet` is created to serve as join key for the rest of the table

for (i in 1:length(IDS2)) {
  for (t in 1:length(IDS2[[i]])){
    for (v in 1:length(IDS2[[i]][[t]])) {
      for (w in 1:length(IDS2[[i]][[t]][[v]])){
        IDS2[[i]][[t]][[v]][[w]] <- cbind(as.data.frame(IDS2[[i]][[t]][[v]][[w]]), 
                                          catégorie.objet = names(IDS2[[i]])[[t]], 
                                          Niv0 = names(IDS2[[i]][[t]])[[v]])
      }
      
      IDS2[[i]][[t]][[v]] <- Reduce(rbind, IDS2[[i]][[t]][[v]])
    }
    IDS2[[i]][[t]] <- Reduce(rbind, IDS2[[i]][[t]])
  }
  IDS2[[i]] <- Reduce(rbind, IDS2[[i]])
}

# Gathering the different associations into a single dataset

IDS2 <- Reduce(rbind, IDS2)

# Delete rows already present in the IDS object

IDS2 <- droplevels(dplyr::anti_join(IDS2, IDS, by = c("catégorie.objet", "CARAC_LVL1")))


# Refine the selection to keep only the rows that correspond to associations 
# actually present in the source dataset

IDS2 <- droplevels(left_join(categorie, IDS2, by = c("catégorie.objet", "CARAC_LVL1", 
                                                     "Niv0")))

IDS2[IDS2 == ""] <- NA

parure <- list()
autres <- list()
for (i in 1:ncol(IDS2)) {
  if(i < ncol(IDS2)) {
    parure[[i]] <- droplevels(subset(IDS2, IDS2[,i] == "Parure" 
                                     & is.na(IDS2[,i+1]) == TRUE))
    autres[[i]] <- droplevels(subset(IDS2, IDS2[,i] == "Autres" 
                                     & is.na(IDS2[,i+1]) == TRUE))
  }
}

parure <- Reduce(rbind, parure)
autres <- Reduce(rbind, autres)
IDS2 <- rbind(parure, autres)
rownames(IDS2) <- c()
Table 30: Recovery of associations with a lower level of accuracy
catégorie.objet CARAC_LVL1 Niv0 CARAC_LVL2 CARAC_LVL3 CARAC_LVL4
1 Bague Lithique Parure Parure NA NA
27 Bague Lithique Parure Parure Autres NA
7 Pendeloque Verre Parure Parure NA NA
8 Perle Céramique Parure Autres Parure NA
9 Bague Os Parure Ossements humains Parure NA
10 Bague Os Parure Ossements animaux Parure NA
11 Bague Os Parure Indéterminé Parure NA
12 Bouton Os Parure Ossements humains Parure NA
13 Bouton Os Parure Ossements animaux Parure NA
14 Bouton Os Parure Indéterminé Parure NA
15 Ceinture Os Parure Ossements humains Parure NA
16 Ceinture Os Parure Ossements animaux Parure NA
17 Ceinture Os Parure Indéterminé Parure NA


However, there are still some problems with obtained the selection Table 30. For bone objects, the thesaurus allows us to specify whether they are human or animal bones. Since we do not have this information, these possibilities should be removed and only the entry Unknown kept. Likewise, there are sometimes several possibilities for a single association. This is the case here for lithic rings (rows 1 and 27 of Table 30), which can be designated by Jewellery or Jewellery/Others. It is necessary to make a choice and remove these duplicates for the merging of the datasets to work.2

# Delete rows with bone types in the CARAC_LVL2 column

IDS2 <- IDS2[-grep("Ossements", IDS2$CARAC_LVL2),]

# Delete the associations Jewellery / Other

autres <- filter(IDS2, IDS2$CARAC_LVL2 == "Parure" & IDS2$CARAC_LVL3 == "Autres")

IDS2 <- droplevels(dplyr::anti_join(IDS2, autres, by = 
                                      c("catégorie.objet", "CARAC_LVL1", 
                                        "CARAC_LVL2", "CARAC_LVL3")))


Table 31: The different associations possible in the dataset
catégorie.objet CARAC_LVL1 Niv0 CARAC_LVL2 CARAC_LVL3 CARAC_LVL4
Bague Lithique Parure Parure NA NA
Épingle Lithique Parure Parure NA NA
Pendeloque Lithique Parure Parure NA NA
Bouton Métal Parure Parure NA NA
Perle Métal Parure Parure NA NA
Bague Verre Parure Parure NA NA
Pendeloque Verre Parure Parure NA NA
Perle Céramique Parure Autres Parure NA
Bague Os Parure Indéterminé Parure NA
Bouton Os Parure Indéterminé Parure NA
Ceinture Os Parure Indéterminé Parure NA
Épingle Os Parure Indéterminé Parure NA
Pendeloque Os Parure Indéterminé Parure NA
Perle Os Parure Indéterminé Parure NA
Épée Os Autres Indéterminé Autres NA
Harnachement Os Autres Indéterminé Autres NA

3.5.2.3 Merging the data


It is now possible to merge the source dataset with the different variants of the thesaurus: for existing associations, for currencies and for non-existing associations.

The first step is to separate the coins information from the thesaurus IDS and store it in a new object IDS_mon. It is now possible to merge the n_objet table with the IDS thesaurus items, then delete the unnecessary columns and reorder the others according to the ArkeoGIS data structure.

# Separate coins from the thesaurus as they work differently

IDS_mon <- droplevels(subset(IDS, CARAC_LVL2 == "Monnaies"))
IDS <- droplevels(subset(IDS, CARAC_LVL2 != "Monnaies"))

# Merge the simple thesaurus (`IDS`) with the dataset

IDS <- droplevels(merge(IDS, n_objet, by=c("CARAC_LVL1", "catégorie.objet")))
IDS <- IDS[,c("Unique.public.identifier", colnames(arkeoGIS))]
Table 32: Dataset extract for existing associations
Unique.public.identifier SITE_SOURCE_ID SITE_NAME MAIN_CITY_NAME GEONAME_ID PROJECTION_SYSTEM LONGITUDE LATITUDE ALTITUDE CITY_CENTROID OCCUPATION STATE_OF_KNOWLEDGE STARTING_PERIOD ENDING_PERIOD CARAC_NAME CARAC_LVL1 CARAC_LVL2 CARAC_LVL3 CARAC_LVL4 CARAC_EXP BIBLIOGRAPHY COMMENTS WEB_IMAGES
PER1370 SIT0092 Augusta Raurica Augst NA 4326 7.71468 47.53557 NA Oui Non renseigné Non renseigné 201:250 301:350 Mobilier Lithique Parure Perle NA Non Scholtus 2021, PER1370, Riha 1990, n° 1353 OBJET : perle. MATERIAU : Agate. TYPO : Riha 1990 type 11.33.1. NA
PER1367 SIT0092 Augusta Raurica Augst NA 4326 7.71468 47.53557 NA Oui Non renseigné Non renseigné 1:25 151:200 Mobilier Lithique Parure Perle NA Non Scholtus 2021, PER1367, Riha 1990, n° 2952b OBJET : perle. MATERIAU : Agate. TYPO : Riha 1990 type 11.32. NA
PER1371 SIT0092 Augusta Raurica Augst NA 4326 7.71468 47.53557 NA Oui Non renseigné Non renseigné 201:250 301:350 Mobilier Lithique Parure Perle NA Non Scholtus 2021, PER1371, Riha 1990, n° 1357c OBJET : perle. MATERIAU : Agate. TYPO : Riha 1990 type 11.33.1. NA
PER1372 SIT0092 Augusta Raurica Augst NA 4326 7.71468 47.53557 NA Oui Non renseigné Non renseigné 201:250 301:350 Mobilier Lithique Parure Perle NA Non Scholtus 2021, PER1372, Riha 1990, n° 1357d OBJET : perle. MATERIAU : Agate. TYPO : Riha 1990 type 11.33.1. NA
PER1368 SIT0092 Augusta Raurica Augst NA 4326 7.71468 47.53557 NA Oui Non renseigné Non renseigné 201:250 301:350 Mobilier Lithique Parure Perle NA Non Scholtus 2021, PER1368, Riha 1990, n° 1351 OBJET : perle. MATERIAU : Agate. TYPO : Riha 1990 type 11.33.1. NA
PER1381 SIT0092 Augusta Raurica Augst NA 4326 7.71468 47.53557 NA Oui Non renseigné Non renseigné 1:25 351:400 Mobilier Lithique Parure Perle NA Non Scholtus 2021, PER1381, Artefacts PRL-4160, Riha 1990, n° 2790g OBJET : perle. MATERIAU : Cornaline. TYPO : Riha 1990 type 11.34. NA


For non-existent associations, i.e. table IDS2, it is first necessary to add the category information for the objects, in particular the column Niv0, to the source dataset to enable merging. However, this operation adds duplicates which must then be deleted.

# Merge the more complex thesaurus (`IDS2`) with the dataset
n_objet2 <- droplevels(merge(n_objet, categorie[,c(1,3)], by="catégorie.objet"))
IDS2 <- droplevels(merge(IDS2, n_objet2, by=c("CARAC_LVL1", "catégorie.objet", "Niv0")))
IDS2 <- IDS2[,c("Unique.public.identifier", colnames(arkeoGIS))]

# Delete duplicates

IDS2 <- IDS2[!duplicated(IDS2),]
Table 33: Dataset extract for non-existent associations
Unique.public.identifier SITE_SOURCE_ID SITE_NAME MAIN_CITY_NAME GEONAME_ID PROJECTION_SYSTEM LONGITUDE LATITUDE ALTITUDE CITY_CENTROID OCCUPATION STATE_OF_KNOWLEDGE STARTING_PERIOD ENDING_PERIOD CARAC_NAME CARAC_LVL1 CARAC_LVL2 CARAC_LVL3 CARAC_LVL4 CARAC_EXP BIBLIOGRAPHY COMMENTS WEB_IMAGES
1 PER0560 SIT0092 Augusta Raurica Augst NA 4326 7.71468 47.53557 NA Oui Non renseigné Non renseigné 26:50 251:300 Mobilier Céramique Autres Parure NA Non Scholtus 2021, PER0560, Riha 1990, n° 829 OBJET : perle. MATERIAU : Céramique. TYPO : Riha 1990 type 11.1.1. NA
3 PER0775 SIT0092 Augusta Raurica Augst NA 4326 7.71468 47.53557 NA Oui Non renseigné Non renseigné 26:50 251:300 Mobilier Céramique Autres Parure NA Non Scholtus 2021, PER0775, Riha 1990, n° 1045 OBJET : perle. MATERIAU : Céramique. TYPO : Riha 1990 type 11.1.1. NA
5 PER0777 SIT0092 Augusta Raurica Augst NA 4326 7.71468 47.53557 NA Oui Non renseigné Non renseigné 26:50 251:300 Mobilier Céramique Autres Parure NA Non Scholtus 2021, PER0777, Riha 1990, n° 1047 OBJET : perle. MATERIAU : Céramique. TYPO : Riha 1990 type 11.1.1. NA
6 PER0519 SIT0092 Augusta Raurica Augst NA 4326 7.71468 47.53557 NA Oui Non renseigné Non renseigné 26:50 251:300 Mobilier Céramique Autres Parure NA Non Scholtus 2021, PER0519, Riha 1990, n° 788 OBJET : perle. MATERIAU : Céramique. TYPO : Riha 1990 type 11.1.1. NA
7 PER0566 SIT0092 Augusta Raurica Augst NA 4326 7.71468 47.53557 NA Oui Non renseigné Non renseigné 26:50 251:300 Mobilier Céramique Autres Parure NA Non Scholtus 2021, PER0566, Riha 1990, n° 835 OBJET : perle. MATERIAU : Céramique. TYPO : Riha 1990 type 11.1.1. NA
8 PER0846 SIT0092 Augusta Raurica Augst NA 4326 7.71468 47.53557 NA Oui Non renseigné Non renseigné 26:50 251:300 Mobilier Céramique Autres Parure NA Non Scholtus 2021, PER0846, Riha 1990, n° 1116 OBJET : perle. MATERIAU : Céramique. TYPO : Riha 1990 type 11.1.1. NA


Finally, to add the coin thesaurus, IDS_mon, it is first necessary to change to NA the materials already specified in the source dataset.

# Merging the coins thesaurus with the dataset
n_objet$Materiaux[n_objet$Materiaux == ""] <- NA

IDS_mon <- droplevels(merge(IDS_mon, n_objet, 
                            by.x=c("CARAC_LVL1", "catégorie.objet","CARAC_LVL3"), 
                            by.y=c("CARAC_LVL1", "catégorie.objet","Materiaux")))
IDS_mon <- IDS_mon[,c("Unique.public.identifier", colnames(arkeoGIS))]
Table 34: Dataset extract for coins
Unique.public.identifier SITE_SOURCE_ID SITE_NAME MAIN_CITY_NAME GEONAME_ID PROJECTION_SYSTEM LONGITUDE LATITUDE ALTITUDE CITY_CENTROID OCCUPATION STATE_OF_KNOWLEDGE STARTING_PERIOD ENDING_PERIOD CARAC_NAME CARAC_LVL1 CARAC_LVL2 CARAC_LVL3 CARAC_LVL4 CARAC_EXP BIBLIOGRAPHY COMMENTS WEB_IMAGES
MON6709 SIT0175 Rotacker Kirchzarten - Zarten NA 4326 7.953658 47.96279 NA Oui Non renseigné Non renseigné -150:-126 -50:-26 Mobilier Métal Monnaies Argent NA Non Scholtus 2021, MON6709, n°408, Nick 2006 OBJET : monnaie. MATERIAU : Argent. TYPO : KALETEDOY, Lingon. NA
MON3625 SIT0352 Robache Saint-Dié-des-Vosges NA 4326 6.956481 48.30710 NA Non Non renseigné Non renseigné -150:-126 -50:-26 Mobilier Métal Monnaies Argent NA Non Scholtus 2021, MON3625, n°358, Lagadec, Liéger 1998 OBJET : monnaie. MATERIAU : Argent. TYPO : KALETEDOY, Lingon. NA
MON3980 SIT0352 Robache Saint-Dié-des-Vosges NA 4326 6.956481 48.30710 NA Non Non renseigné Non renseigné -150:-126 -50:-26 Mobilier Métal Monnaies Argent NA Non Scholtus 2021, MON3980, Lagadec, Liéger 1998, n°358 OBJET : monnaie. MATERIAU : Argent. TYPO : KALETEDOY, Lingon. NA
MON3957 SIT0352 Robache Saint-Dié-des-Vosges NA 4326 6.956481 48.30710 NA Non Non renseigné Non renseigné -150:-126 -50:-26 Mobilier Métal Monnaies Argent NA Non Scholtus 2021, MON3957, Lagadec, Liéger 1998, n°358 OBJET : monnaie. MATERIAU : Argent. TYPO : KALETEDOY, Lingon. NA
MON4199 SIT0352 Robache Saint-Dié-des-Vosges NA 4326 6.956481 48.30710 NA Non Non renseigné Non renseigné -150:-126 -50:-26 Mobilier Métal Monnaies Argent NA Non Scholtus 2021, MON4199, n°358, Lagadec, Liéger 1998 OBJET : monnaie. MATERIAU : Argent. TYPO : KALETEDOY, Lingon. NA
MON3501 SIT0352 Robache Saint-Dié-des-Vosges NA 4326 6.956481 48.30710 NA Non Non renseigné Non renseigné -150:-126 -50:-26 Mobilier Métal Monnaies Argent NA Non Scholtus 2021, MON3501, Lagadec, Liéger 1998, n°358 OBJET : monnaie. MATERIAU : Argent. TYPO : KALETEDOY, Lingon. NA


Finally, these three tables are joined to recreate a single dataset named total and the column Unique.public.identifier is removed.

# Join the data

total <- rbind(IDS, IDS2, IDS_mon)

# Delete the Unique.public.identifier column

total <- total[,-1]


4 Add new lines for sites


The total table is now ready to be imported into ArkeoGIS. However, we have seen in the Section 1 that although this is a table dedicated to furnitures, a second table allows to define more precisely the sites from where these objects come. It is therefore possible to add rows to the total dataset to specify these sites.


4.1 Retrieving site information


The first step is to retrieve in a new table sites2 the unique occurrences of each site in the table total using the unique identifier SITE_SOURCE_ID. The dating, characteristics and bibliography columns can then be deleted as these will be replaced by data corresponding to the sites themselves. The data in the column COMMENTS is deleted.

# Retrieve unique hits for each site

sites2 <- distinct(total, SITE_SOURCE_ID, .keep_all = TRUE)

# Delete the columns that will be replaced

sites2 <- sites2[,c(1:11, 21, 22)]

# Delete the contents of the COMMENTS column

sites2[,12] <- NA


4.2 Dates


The column Datation in the table sites (Table 35: Before) does not correspond to the structure and format requested by ArkeoGIS for dates. They are also not structured, data are input in the form of dates or text without logic. They therefore require cleaning and homogenisation.

The first step is to replace the textual information with the dates from the chronology of continental Europe proposed by ArkeoGIS, but also to structure the data so that they all have the same format and can be manipulated later.

# Replace the period names with the corresponding dates

sites[,5] <- sites[,5] %>%
  str_replace_all(c("[Pp]ériode augustéenne|[Aa]ugustéen|début époque romaine" = "-25:14",
                    "Bas-?\\s?Empire" = "236:476",
                    "haut Moyen-Age" = "477:986",
                    "Fin Ier-IIe s. ap. J.-C." = "76:200",
                    "Gallo-romain|[Gg]al+o|époque romaine" = "-25:476",
                    "Hallstatt" = "-800:-481",
                    "Mérovingien" = "477:800",
                    "Haut-?\\s?Empire" = "15:235",
                    "Auguste-Claude" = "-25:50",
                    "IIIe s\\.|IIIe" = "201:300",
                    "Milieu Ier ap. J.-C." = "26:75",
                    "Ier" = "1:100",
                    "IVe s. av." = "-400:-301",
                    "IVe s. apr.|IVe s. ap. J.-C." = "301:400",
                    "Ve" = "401:500",
                    "VIIe s. apr." = "601:700",
                    "IIe s. apr." = "101:200",
                    "La Tène\\s|LT\\s" = "-480:-26",
                    "LTA-B" = "-480:-261",
                    "LTA" = "-480:-381",
                    "LTB" = "-380:-261",
                    "LTC1" = "-260:-201",
                    "LTC2" = "-200:-151",
                    "LTC" = "-260:-151",
                    "LTD1-D2a" = "-150:-51",
                    "LTD1b_D2a" = "-125:-51",
                    "LTD1a" = "-150:-126",
                    "LTD1b" = "-125:-76",
                    "LTD1" = "-150:-76",
                    "LT?D2a" = "-75:-51",
                    "LTD2b|Précoce" = "-50:-26",
                    "LTD2" = "-75:-26",
                    "LTD" = "-150:-26",
                    
                    "Non renseigné|^$" = "Indéterminé",
                    ";|/|au" = "-",
                    "--" = " - -",
                    "(\\d+)-\\s" = "\\1 - ",
                    "(\\d+)-(\\d+)" = "\\1 - \\2"
                    ))

Table 35: Datations column

(a) Before
Var1 Freq
586
-10:30 2
-120:-50 4
-150:-126 1
-150:-26 79
-150:-76 3
-150:26 1
-150:30 1
-200:-151 5
-225:-201 2
Augustéen 2
Bas-Empire 1
Bas-Empire au haut Moyen-Age 1
Fin Ier-IIe s. ap. J.-C. 1
Gallo 8
Hallstatt - Mérovingien 1
Haut-Empire 4
Haut-Empire - Bas-Empire 1
Haut-Empire - IIIe 1
Haut-Empire au Bas-Empire 20
Haut-Empire au haut Moyen-Age 6
Haut-Empire au IIIe s. 2
Ier-Ve 1
IIIe s. au Bas Empire 2
IIIe s. au haut Moyen-Age 1
Indéterminé 2
IVe s. av. - IVe s. apr. 1
La Tène au Bas-Empire 5
LT - Gallo 2
LTA-B / LTD 1
LTA - Haut-Empire 1
LTB-Gallo 1
LTC 12
LTC-gallo 1
LTC-Gallo 4
LTC-LTD 2
LTC1 2
LTC1-Gallo 1
LTC1-LTC2 1
LTC2 1
LTC2-Gallo 2
LTC2-LTD1 8
LTC2-LTD1a 1
LTC2-LTD2 1
LTC2 - IVe s. ap. J.-C. 1
LTC2 ; Auguste-Claude 1
LTD 142
LTD-augustéen 1
LTD-gallo 1
LTD-Gallo 63
LTD-Gallo-romain 2
LTD-Galo 1
LTD - Augustéen 1
LTD - Gallo 1
LTD - période augustéenne 2
LTD - VIIe s. apr. 1
LTD1 4
LTD1-D2a 1
LTD1-Gallo 1
LTD1-Période augustéenne 1
LTD1 - début époque romaine 1
LTD1b 3
LTD1b-Gallo 2
LTD1b-LD2a 1
LTD1b-LTD2a 1
LTD1b - IIe s. apr. 1
LTD1b_D2a - époque romaine 1
LTD2 6
LTD2-Galllo 1
LTD2-Gallo 7
LTD2-Haut Empire 2
LTD2 - Gallo 1
LTD2b - Augustéen 2
Mérovingien 1
Milieu Ier ap. J.-C. 1
Non renseigné 1
Période augustéenne 2
Précoce - Augustéen 1
(b) After
Var1 Freq
-10:30 2
-120:-50 4
-125:-51 - -25:476 1
-125:-76 3
-125:-76 - -25:476 2
-125:-76 - -75:-51 2
-125:-76 - 101:200 1
-150:-126 1
-150:-26 221
-150:-26 - -25:14 4
-150:-26 - -25:476 68
-150:-26 - 601:700 1
-150:-51 1
-150:-76 7
-150:-76 - -25:14 2
-150:-76 - -25:476 1
-150:26 1
-150:30 1
-200:-151 6
-200:-151 - -150:-126 1
-200:-151 - -150:-76 8
-200:-151 - -25:476 2
-200:-151 - -25:50 1
-200:-151 - -75:-26 1
-200:-151 - 301:400 1
-225:-201 2
-25:14 4
-25:476 8
-260:-151 12
-260:-151 - -150:-26 2
-260:-151 - -25:476 5
-260:-201 2
-260:-201 - -200:-151 1
-260:-201 - -25:476 1
-380:-261 - -25:476 1
-400:-301 - 301:400 1
-480:-26 - -25:476 2
-480:-26 - 236:476 5
-480:-261 - -150:-26 1
-480:-381 - 15:235 1
-50:-26 - -25:14 3
-75:-26 6
-75:-26 - -25:476 9
-75:-26 - 15:235 2
-800:-481 - 477:800 1
1:100 - 401:500 1
15:235 4
15:235 - 201:300 3
15:235 - 236:476 21
15:235 - 477:986 6
201:300 - 236:476 2
201:300 - 477:986 1
236:476 1
236:476 - 477:986 1
26:75 1
477:800 1
76:200 1
Indéterminé 589


It is now possible to repeat the code used Section 3.4.2 to create the columns STARTING_PERIOD, ENDING_PERIOD in the table sites2, adapting the regular expressions to the actual data.

# Retrieve start and end dates in separate columns

date <- data.frame(STARTING_PERIOD = character(), ENDING_PERIOD = character())
date <- strcapture("^(-?\\d+:-?\\d+)\\s-\\s(-?\\d+:-?\\d+)$", sites$Datation, date)

# Retrieve unique dates

date2 <- data.frame(STARTING_PERIOD = character())
date2 <- strcapture("^(-?\\d+:-?\\d+)$", sites$Datation, date2)
# Duplicate the column to have an end date
date2$ENDING_PERIOD <- date2$STARTING_PERIOD

# Add an identical unique identifier to both tables
# This works because missing data is indicated by NAs 
# and the order of the data is maintained

date$ID <- paste0("ID", 1:nrow(date))
date2$ID <- paste0("ID", 1:nrow(date2))

# Join the two tables by replacing the missing data in the `date` table 
# with the data in the `date2` table and replace the remaining NA with "undetermined"

date3 <- dplyr::left_join(date, date2, by = "ID") %>%
  
  ## This operation creates new STARTING_PERIOD and ENDING_PERIOD columns x and y
  # Replace the NAs with the data available in the y columns
  
  dplyr::mutate(STARTING_PERIOD = ifelse(is.na(STARTING_PERIOD.x), 
                                         STARTING_PERIOD.y, STARTING_PERIOD.x)) %>%
  dplyr::mutate(ENDING_PERIOD = ifelse(is.na(ENDING_PERIOD.x), 
                                       ENDING_PERIOD.y, ENDING_PERIOD.x)) %>%
  
  # Delete excess columns
  
  dplyr::select(-STARTING_PERIOD.y, -STARTING_PERIOD.x) %>%
  dplyr::select(-ENDING_PERIOD.y, -ENDING_PERIOD.x) %>%
  
  # And replace the remaining NA with the word "undetermine".
  
  replace(is.na(.), "Indéterminé")

# Delete ID column

date3 <- date3[,-1]
Table 36: Extrait des nouvelles colonnes de datations
STARTING_PERIOD ENDING_PERIOD
-400:-301 301:400
Indéterminé Indéterminé
Indéterminé Indéterminé
-150:-26 -150:-26
-75:-26 -75:-26
-150:-76 -150:-76


These new date columns can now be added to the sites table.

# Add the dates to the `sites` table

sites <- cbind(sites, date3)
Table 37: Adding date columns to the sites table
SITE_SOURCE_ID Centroide Etat.de.la.recherche Occupation Datation Bibliographie Activités.artisanales Type.de.site STARTING_PERIOD ENDING_PERIOD
SIT0001 VRAI Fouillé Non renseigné -400:-301 - 301:400 R. Cordie-Hachenberg, A. Haffner 1991, Haffner 1971 Funéraire -400:-301 301:400
SIT0002 VRAI Non renseigné Non renseigné Indéterminé Habitat indéterminé Indéterminé Indéterminé
SIT0003 VRAI Non renseigné Non renseigné Indéterminé Indéterminé Indéterminé Indéterminé
SIT0004 VRAI Non renseigné Non renseigné -150:-26 Nick 2006 Féliu 2008 Habitat de plaine -150:-26 -150:-26
SIT0005 VRAI Sondé Unique -75:-26 ArkeoGIS D. Mischka 2007 Hofmann 1999, 74 Weber-Jenisch 1995, 136 Röder 1995, 176-17 Bender et alii 1993 Féliu 2008 Habitat indéterminé -75:-26 -75:-26
SIT0006 VRAI Prospecté pédestre Unique -150:-76 Hofmann 1999, 79 Weber-Jenisch 1995, 138-139 Burkhardt 1998 Féliu 2008 Fer Dépôt; Habitat indéterminé -150:-76 -150:-76


4.3 Linking information to the sites in the object base


Now that the dates are in the ArkeoGIS format, it is possible to add them to the sites2 table, i.e. the one containing the sites mentioned in the objects table, together with the bibliographic information, using the unique identifier of the sites.

# Add date and bibliography columns to the `sites2` table

sites2 <- droplevels(merge(sites2, sites[,c(1,6,7:10)], by = "SITE_SOURCE_ID"))

# Rename the bibliography column to match ArkeoGIS

colnames(sites2)[14] <- colnames(arkeoGIS[20])
Table 38: Add date and bibliography columns to the sites2 table
SITE_SOURCE_ID SITE_NAME MAIN_CITY_NAME GEONAME_ID PROJECTION_SYSTEM LONGITUDE LATITUDE ALTITUDE CITY_CENTROID OCCUPATION STATE_OF_KNOWLEDGE COMMENTS WEB_IMAGES BIBLIOGRAPHY Activités.artisanales Type.de.site STARTING_PERIOD ENDING_PERIOD
SIT0001 Wederath-Belginum Wederath NA 4326 7.164440 49.86499 NA Oui Non renseigné Fouillé NA NA R. Cordie-Hachenberg, A. Haffner 1991, Haffner 1971 Funéraire -400:-301 301:400
SIT0002 Cherain-Brisy Cherain NA 4326 5.865210 50.17996 NA Oui Non renseigné Non renseigné NA NA Habitat indéterminé Indéterminé Indéterminé
SIT0003 Han-sur-Lesse Han-sur-Lesse NA 4326 5.186230 50.12674 NA Oui Non renseigné Non renseigné NA NA Indéterminé Indéterminé Indéterminé
SIT0004 Klosteräcker Breisach-Hochstetten NA 4326 7.625970 48.02291 NA Oui Non renseigné Non renseigné NA NA Nick 2006 Féliu 2008 Habitat de plaine -150:-26 -150:-26
SIT0005 Münsterberg Breisach NA 4326 7.579663 48.02931 NA Oui Unique Sondé NA NA ArkeoGIS D. Mischka 2007 Hofmann 1999, 74 Weber-Jenisch 1995, 136 Röder 1995, 176-17 Bender et alii 1993 Féliu 2008 Habitat indéterminé -75:-26 -75:-26
SIT0006 Kegelriß Ehrenkirchen-Ehrenstetten NA 4326 7.776640 47.89846 NA Oui Unique Prospecté pédestre NA NA Hofmann 1999, 79 Weber-Jenisch 1995, 138-139 Burkhardt 1998 Féliu 2008 Fer Dépôt; Habitat indéterminé -150:-76 -150:-76


4.4 Add sites characteristics


The sites table contains two pieces of information to characterise the sites which correspond to two different thesauri in ArkeoGIS: the types of sites and the artisanal activities. It is therefore necessary to proceed in two successive steps.


4.4.1 Types of sites


Site types are indicated by the Real Estate feature in ArkeoGIS.

immo <- read.csv2("data/Immobilier.csv", sep = ";", encoding = "UTF-8")
Table 39: Thesaurus extract for describing real estate in ArkeoGIS
IDArkeoGIS CARAC_NAME CARAC_LVL1 CARAC_LVL2 CARAC_LVL3 CARAC_LVL4 IdArk IdPactols IdAat
2446 Immobilier Édifice public https://ark.frantiq.fr/ark:/26678/pcrt4HEc49rMp7 17636
2447 Immobilier Édifice public Amphithéâtre https://ark.frantiq.fr/ark:/26678/pcrtEmYHAsBI3W 13174
2448 Immobilier Édifice public Odéon https://ark.frantiq.fr/ark:/26678/pcrtzY8mRfSGA7 15958
2449 Immobilier Édifice public Théâtre https://ark.frantiq.fr/ark:/26678/pcrtU6s8jjsWol 17297
524 Immobilier Habitat http://ark.frantiq.fr/ark:/26678/pcrtbptj4SOA1W 14990 http://vocab.getty.edu/page/aat/300008347
525 Immobilier Habitat Non renseigné NA
Tip

The thesaurus can also be download here.


The first step to align the data with this thesaurus is to specify in the sites2 table that the following features refer to real estate by creating the column CARAC_NAME

# Create the column CARAC_NAME with the information "Real Estate"

sites2[,colnames(arkeoGIS[14])] <- "Immobilier"


The following features, as for furniture, require cleaning of existing data and adaptation to the thesaurus. Furthermore, there may be several types of sites in the same cell, separated by a ; (Table 40). This information must be extracted and the site concerned duplicated as many times as there are features (Table 41).

# Separate and multiply rows with several types

sites2 <- tidyr::separate_rows(sites2, Type.de.site, sep = ";")

# Remove superfluous spaces at the beginning and end of cells

sites2$Type.de.site <- str_trim(sites2$Type.de.site)

Table 40: Content of column Type.de.site in table sites

(a) Before
Var1 Freq
184
Autre 5
Camp militaire 5
Dépôt 12
Dépôt; Habitat de plaine 4
Dépôt; Habitat indéterminé 2
Funéraire 164
Funéraire; Habitat indéterminé 2
Funéraire; Indéterminé 5
Habitat de hauteur 2
Habitat de hauteur; Indéterminé; Site fortifié 1
Habitat de hauteur; Site fortifié 20
Habitat de plaine 65
Habitat de plaine; Indéterminé 2
Habitat de plaine; Site fortifié 6
Habitat indéterminé 77
Habitat indéterminé; Indéterminé 3
Habitat indéterminé; Indéterminé; Site fortifié 2
Habitat indéterminé; Sanctuaire 2
Habitat indéterminé; Site fortifié 26
Indéterminé 423
Sanctuaire 10
Site d'exploitation 2
Site fortifié 1
Structure isolée 1
Trouvaille isolée 18
(b) After
Var1 Freq
180
Autre 5
Camp militaire 5
Dépôt 17
Funéraire 159
Habitat de hauteur 22
Habitat de plaine 64
Habitat indéterminé 106
Indéterminé 429
Sanctuaire 12
Site d'exploitation 2
Site fortifié 49
Trouvaille isolée 18
Table 41: The sites2 table after separation of multiple values
SITE_SOURCE_ID SITE_NAME MAIN_CITY_NAME GEONAME_ID PROJECTION_SYSTEM LONGITUDE LATITUDE ALTITUDE CITY_CENTROID OCCUPATION STATE_OF_KNOWLEDGE COMMENTS WEB_IMAGES BIBLIOGRAPHY Activités.artisanales Type.de.site STARTING_PERIOD ENDING_PERIOD CARAC_NAME
SIT0068 Besançon Besançon NA 4326 6.01815 47.24878 NA Oui Non renseigné Non renseigné NA NA Habitat indéterminé -150:-26 -150:-26 Immobilier
SIT0069 Dragages du Doubs Besançon NA 4326 6.01815 47.24878 NA Oui Non renseigné Non renseigné NA NA Habitat indéterminé -150:-26 -150:-26 Immobilier
SIT0069 Dragages du Doubs Besançon NA 4326 6.01815 47.24878 NA Oui Non renseigné Non renseigné NA NA Site fortifié -150:-26 -150:-26 Immobilier
SIT0070 Extra muros Besançon NA 4326 6.01815 47.24878 NA Oui Non renseigné Non renseigné NA NA Habitat indéterminé -150:-26 -150:-26 Immobilier
SIT0071 Arsenal Besançon NA 4326 6.01815 47.24878 NA Oui Non renseigné Non renseigné NA NA Habitat indéterminé -150:-26 -150:-26 Immobilier
SIT0071 Arsenal Besançon NA 4326 6.01815 47.24878 NA Oui Non renseigné Non renseigné NA NA Site fortifié -150:-26 -150:-26 Immobilier


Because the sites in the source database are described by only 13 different types and because they can correspond to several entries in the ArkeoGIS thesaurus, it is easier to perform this alignment by hand here. Furthermore, most of these types are not specific enough to fill the four levels of characterisation in the thesaurus. Thus, except for Sanctuary, only CARAC_LVL1 is implemented. Finally, some types do not allow a site to be described and should be removed later. In the meantime, Undetermine is indicated.

# Retrieve in a table the different types of sites

immo2 <- as.data.frame(table(sites2$Type.de.site))

# Add empty characterisation columns to this table

col <- c("CARAC_LVL1", "CARAC_LVL2", "CARAC_LVL3", "CARAC_LVL4")
immo2[, col] <- NA

# For each level of characterisation indicate the corresponding value in ArkeoGIS

immo2$CARAC_LVL1 <- immo2$Var1 %>%
  str_replace_all(c(
    "Site d'exploitation|Indéterminé|Trouvaille isolée|Autre" = NA,
    "Camp militaire|Site fortifié" = "Enceinte",
    "Dépôt" = "Dépôt",
    "Funéraire" = "Funéraire",
    "Habitat.*" = "Habitat",
    "Sanctuaire" = "Rituel"
  ))
immo2$CARAC_LVL1[immo2$Var1 == ""] <- NA

immo2$CARAC_LVL2 <- immo2$Var1 %>%
  str_replace_all(c(
    "Site d'exploitation|Indéterminé|Trouvaille isolée|Autre" = NA,
    "Camp militaire|Site fortifié" = NA,
    "Dépôt" = NA,
    "Funéraire" = NA,
    "Habitat.*" = NA,
    "Sanctuaire" = "Édifice religieux"
  ))
immo2$CARAC_LVL2[immo2$Var1 == ""] <- NA

immo2$CARAC_LVL3 <- immo2$Var1 %>%
  str_replace_all(c(
    "Site d'exploitation|Indéterminé|Trouvaille isolée|Autre" = NA,
    "Camp militaire|Site fortifié" = NA,
    "Dépôt" = NA,
    "Funéraire" = NA,
    "Habitat.*" = NA,
    "Sanctuaire" = "Sanctuaire"
  ))
immo2$CARAC_LVL3[immo2$Var1 == ""] <- NA

# delete the Freq column created by the table function used at the beginning

immo2 <- immo2[,-2]
Table 42: Alignment of site types to the ArkeoGIS thesaurus
Var1 CARAC_LVL1 CARAC_LVL2 CARAC_LVL3 CARAC_LVL4
NA NA NA NA
Autre NA NA NA NA
Camp militaire Enceinte NA NA NA
Dépôt Dépôt NA NA NA
Funéraire Funéraire NA NA NA
Habitat de hauteur Habitat NA NA NA
Habitat de plaine Habitat NA NA NA
Habitat indéterminé Habitat NA NA NA
Indéterminé NA NA NA NA
Sanctuaire Rituel Édifice religieux Sanctuaire NA
Site d'exploitation NA NA NA NA
Site fortifié Enceinte NA NA NA
Trouvaille isolée NA NA NA NA
# Add characterisation columns for each corresponding site

sites2 <- droplevels(merge(sites2, immo2, by.x = "Type.de.site", by.y = "Var1"))
Table 43: Adding the ArkeoGIS thesaurus to the sites2 table
Type.de.site SITE_SOURCE_ID SITE_NAME MAIN_CITY_NAME GEONAME_ID PROJECTION_SYSTEM LONGITUDE LATITUDE ALTITUDE CITY_CENTROID OCCUPATION STATE_OF_KNOWLEDGE COMMENTS WEB_IMAGES BIBLIOGRAPHY Activités.artisanales STARTING_PERIOD ENDING_PERIOD CARAC_NAME CARAC_LVL1 CARAC_LVL2 CARAC_LVL3 CARAC_LVL4
208 Funéraire SIT0001 Wederath-Belginum Wederath NA 4326 7.164440 49.86499 NA Oui Non renseigné Fouillé NA NA R. Cordie-Hachenberg, A. Haffner 1991, Haffner 1971 -400:-301 301:400 Immobilier Funéraire NA NA NA
482 Habitat indéterminé SIT0002 Cherain-Brisy Cherain NA 4326 5.865210 50.17996 NA Oui Non renseigné Non renseigné NA NA Indéterminé Indéterminé Immobilier Habitat NA NA NA
616 Indéterminé SIT0003 Han-sur-Lesse Han-sur-Lesse NA 4326 5.186230 50.12674 NA Oui Non renseigné Non renseigné NA NA Indéterminé Indéterminé Immobilier NA NA NA NA
434 Habitat de plaine SIT0004 Klosteräcker Breisach-Hochstetten NA 4326 7.625970 48.02291 NA Oui Non renseigné Non renseigné NA NA Nick 2006 Féliu 2008 -150:-26 -150:-26 Immobilier Habitat NA NA NA
458 Habitat indéterminé SIT0005 Münsterberg Breisach NA 4326 7.579663 48.02931 NA Oui Unique Sondé NA NA ArkeoGIS D. Mischka 2007 Hofmann 1999, 74 Weber-Jenisch 1995, 136 Röder 1995, 176-17 Bender et alii 1993 Féliu 2008 -75:-26 -75:-26 Immobilier Habitat NA NA NA
195 Dépôt SIT0006 Kegelriß Ehrenkirchen-Ehrenstetten NA 4326 7.776640 47.89846 NA Oui Unique Prospecté pédestre NA NA Hofmann 1999, 79 Weber-Jenisch 1995, 138-139 Burkhardt 1998 Féliu 2008 Fer -150:-76 -150:-76 Immobilier Dépôt NA NA NA


The rows for which no alignment could be made still need to be removed. However, among these types, there was the mention of Operation Site which refers to production sites and therefore to the next part (Section 4.4.2). The Real Estate should be changed to Production for these items before deleting all other rows with NA in the CARAC_LVL1 column. However, one of these sites is also distinguished by the column Artisanal.Activities and will therefore be added again in the next step, so it can be deleted

Finally, the columns Type.de.site and BIBLIOGRAPHY can be deleted and the column CARAC_EXP must be added. Last, the table is reordered according to the model defined by ArkeoGIS.

# Distinguish between sites with "Operating sites" as the site type

sites2$CARAC_NAME[sites2$Type.de.site == "Site d'exploitation"] <- "Production"
sites2$CARAC_LVL1[sites2$Type.de.site == "Site d'exploitation"] <- "Autres"

# Remove unnecessary rows

sites2 <- sites2[!(sites2$CARAC_NAME == "Immobilier" & is.na(sites2$CARAC_LVL1) == T),]
sites2 <- sites2[!(sites2$SITE_SOURCE_ID == "SIT0269" & sites2$CARAC_NAME == "Production"),]

# Delete and add the necessary columns

sites2 <- sites2[ ,-c(1, 16)]
sites2$CARAC_EXP <- "Non"

# Reordering the table

sites2 <- sites2[,colnames(arkeoGIS)]
'data.frame':   435 obs. of  22 variables:
 $ SITE_SOURCE_ID    : chr  "SIT0001" "SIT0002" "SIT0004" "SIT0005" ...
 $ SITE_NAME         : chr  "Wederath-Belginum" "Cherain-Brisy" "Klosteräcker" "Münsterberg" ...
 $ MAIN_CITY_NAME    : chr  "Wederath" "Cherain" "Breisach-Hochstetten" "Breisach" ...
 $ GEONAME_ID        : logi  NA NA NA NA NA NA ...
 $ PROJECTION_SYSTEM : chr  "4326" "4326" "4326" "4326" ...
 $ LONGITUDE         : num  7.16 5.87 7.63 7.58 7.78 ...
 $ LATITUDE          : num  49.9 50.2 48 48 47.9 ...
 $ ALTITUDE          : logi  NA NA NA NA NA NA ...
 $ CITY_CENTROID     : Factor w/ 2 levels "Non","Oui": 2 2 2 2 2 2 2 2 1 2 ...
 $ OCCUPATION        : chr  "Non renseigné" "Non renseigné" "Non renseigné" "Unique" ...
 $ STATE_OF_KNOWLEDGE: chr  "Fouillé" "Non renseigné" "Non renseigné" "Sondé" ...
 $ STARTING_PERIOD   : chr  "-400:-301" "Indéterminé" "-150:-26" "-75:-26" ...
 $ ENDING_PERIOD     : chr  "301:400" "Indéterminé" "-150:-26" "-75:-26" ...
 $ CARAC_NAME        : chr  "Immobilier" "Immobilier" "Immobilier" "Immobilier" ...
 $ CARAC_LVL1        : chr  "Funéraire" "Habitat" "Habitat" "Habitat" ...
 $ CARAC_LVL2        : chr  NA NA NA NA ...
 $ CARAC_LVL3        : chr  NA NA NA NA ...
 $ CARAC_LVL4        : logi  NA NA NA NA NA NA ...
 $ CARAC_EXP         : chr  "Non" "Non" "Non" "Non" ...
 $ BIBLIOGRAPHY      : chr  "R. Cordie-Hachenberg, A. Haffner 1991, Haffner 1971" "" "Nick 2006\nFéliu 2008" "ArkeoGIS D. Mischka 2007\nHofmann 1999, 74\nWeber-Jenisch 1995, 136\nRöder 1995, 176-17\nBender et alii 1993\nFéliu 2008" ...
 $ COMMENTS          : logi  NA NA NA NA NA NA ...
 $ WEB_IMAGES        : logi  NA NA NA NA NA NA ...
Table 44: The sites2 table after the various modifications


4.4.2 Types of production


Production types are filled in by the feature Production in ArkeoGIS.

prod <- read.csv2("data/Production.csv", sep = ";", encoding = "UTF-8")
Table 45: Thesaurus extract for describing production in ArkeoGIS
IDArkeoGIS CARAC_NAME CARAC_LVL1 CARAC_LVL2 CARAC_LVL3 CARAC_LVL4 IdArk IdPactols IdAat
361 Production Agricole http://ark.frantiq.fr/ark:/26678/pcrtIdEjFMFAod 13681 http://vocab.getty.edu/page/aat/300004895
362 Production Agricole Grenier http://ark.frantiq.fr/ark:/26678/pcrtGBctSVZOBy 14923 http://vocab.getty.edu/page/aat/300004929
363 Production Agricole Silo http://ark.frantiq.fr/ark:/26678/pcrtdgALrHBSlv 16944 http://vocab.getty.edu/page/aat/300005024
364 Production Céramique https://ark.frantiq.fr/ark:/26678/pcrtYPpmkDk2rS 17239 http://vocab.getty.edu/page/aat/300386308
365 Production Céramique Extraction http://ark.frantiq.fr/ark:/26678/pcrtpZqoqAbrdg 21835 http://vocab.getty.edu/page/aat/300000388
366 Production Céramique Atelier http://ark.frantiq.fr/ark:/26678/pcrtfUZSMbifle 15962 http://vocab.getty.edu/page/aat/300004531
Tip

The thesaurus can also be download here.


Since in the sites2 table the sites have been multiplied according to their number of different types, it is not possible to reuse it directly to fill in the production sites. It is necessary to create a new table Sites_prod using the code used Section 4.1.

# Retrieve unique hits for each site

Sites_prod <- distinct(total, SITE_SOURCE_ID, .keep_all = TRUE)

# Delete the columns that will be replaced

Sites_prod <- Sites_prod[,c(1:11, 21, 22)]

# Delete the contents of the COMMENTS column

Sites_prod[,12] <- NA


In the same way as for the site types, information on bibliography and dating can be added to the table sites.

# Add date and bibliography columns to the `Sites_prod` table

Sites_prod <- droplevels(merge(Sites_prod, sites[,c(1,6,7:10)], by = "SITE_SOURCE_ID"))

# Rename the bibliography column to match ArkeoGIS

colnames(Sites_prod)[14] <- colnames(arkeoGIS[20])
Table 46: Add date and bibliography columns to the Sites_prod table
SITE_SOURCE_ID SITE_NAME MAIN_CITY_NAME GEONAME_ID PROJECTION_SYSTEM LONGITUDE LATITUDE ALTITUDE CITY_CENTROID OCCUPATION STATE_OF_KNOWLEDGE COMMENTS WEB_IMAGES BIBLIOGRAPHY Activités.artisanales Type.de.site STARTING_PERIOD ENDING_PERIOD
SIT0001 Wederath-Belginum Wederath NA 4326 7.164440 49.86499 NA Oui Non renseigné Fouillé NA NA R. Cordie-Hachenberg, A. Haffner 1991, Haffner 1971 Funéraire -400:-301 301:400
SIT0002 Cherain-Brisy Cherain NA 4326 5.865210 50.17996 NA Oui Non renseigné Non renseigné NA NA Habitat indéterminé Indéterminé Indéterminé
SIT0003 Han-sur-Lesse Han-sur-Lesse NA 4326 5.186230 50.12674 NA Oui Non renseigné Non renseigné NA NA Indéterminé Indéterminé Indéterminé
SIT0004 Klosteräcker Breisach-Hochstetten NA 4326 7.625970 48.02291 NA Oui Non renseigné Non renseigné NA NA Nick 2006 Féliu 2008 Habitat de plaine -150:-26 -150:-26
SIT0005 Münsterberg Breisach NA 4326 7.579663 48.02931 NA Oui Unique Sondé NA NA ArkeoGIS D. Mischka 2007 Hofmann 1999, 74 Weber-Jenisch 1995, 136 Röder 1995, 176-17 Bender et alii 1993 Féliu 2008 Habitat indéterminé -75:-26 -75:-26
SIT0006 Kegelriß Ehrenkirchen-Ehrenstetten NA 4326 7.776640 47.89846 NA Oui Unique Prospecté pédestre NA NA Hofmann 1999, 79 Weber-Jenisch 1995, 138-139 Burkhardt 1998 Féliu 2008 Fer Dépôt; Habitat indéterminé -150:-76 -150:-76


Finally, this table must be filtered to keep only the sites for which an artisanal activity is mentioned. It is then possible to create the column CARAC_NAME by indicating Production.

# Replace the empty cells in the column Craft Activities with "NA".

Sites_prod[Sites_prod$Activités.artisanales == "", 15] <- NA

# Keep only the lines for which the information Craft activities exists

Sites_prod <- Sites_prod[is.na(Sites_prod$Activités.artisanales) == F,]

# Create the CARAC_NAME column

Sites_prod$CARAC_NAME <- "Production"
Table 47: Extraction of sites for which production information is present
SITE_SOURCE_ID SITE_NAME MAIN_CITY_NAME GEONAME_ID PROJECTION_SYSTEM LONGITUDE LATITUDE ALTITUDE CITY_CENTROID OCCUPATION STATE_OF_KNOWLEDGE COMMENTS WEB_IMAGES BIBLIOGRAPHY Activités.artisanales Type.de.site STARTING_PERIOD ENDING_PERIOD CARAC_NAME
6 SIT0006 Kegelriß Ehrenkirchen-Ehrenstetten NA 4326 7.776640 47.89846 NA Oui Unique Prospecté pédestre NA NA Hofmann 1999, 79 Weber-Jenisch 1995, 138-139 Burkhardt 1998 Féliu 2008 Fer Dépôt; Habitat indéterminé -150:-76 -150:-76 Production
7 SIT0007 Limberg Sasbach am Kaiserstuhl NA 4326 7.603129 48.15083 NA Oui Non renseigné Non renseigné NA NA Hofmann 1999, 101 Weber-Jenisch 1995 ArkeoGIS v3 D. Mischka 2007 Féliu 2008 Fer Habitat indéterminé; Site fortifié -150:-26 -150:-26 Production
83 SIT0083 Mont-Châtel Boviolles NA 4326 5.401720 48.63913 NA Non Unique Fouillé NA NA Bronze; Fer; Orfévrerie; Tabletterie; Textile Habitat de hauteur; Site fortifié -150:-76 -25:14 Production
85 SIT0085 La Pierre d'Appel Étival-Clairefontaine NA 4326 6.863134 48.37632 NA Non Non renseigné Fouillé NA NA Féliu 2008 n°041 Fer; Textile Habitat de hauteur; Site fortifié -260:-151 -25:476 Production
86 SIT0086 La Bure Saint-Dié-des-Vosges NA 4326 6.937480 48.32626 NA Non Continue Fouillé NA NA Scholtus 2014 Bronze; Fer; Métallurgie Habitat de hauteur; Site fortifié -200:-151 301:400 Production
168 SIT0169 Butte Sainte-Geneviève Essey-lès-Nancy NA 4326 6.226259 48.71639 NA Oui Unique Fouillé NA NA Féliu 2008 n°040 Burnand 2003, p43. Colin 1998, p. 149-151. Dechezleprêtre 2003 Fer Habitat de hauteur; Indéterminé; Site fortifié -75:-26 -75:-26 Production


As for the previous part (Section 4.4.1), several productions can be entered in the same cell. It is therefore necessary to separate them and duplicate the lines concerned.

# Separate and multiply rows with several types

Sites_prod <- tidyr::separate_rows(Sites_prod, Activités.artisanales, sep = ";")

# Remove superfluous spaces at the beginning and end of cells

Sites_prod$Activités.artisanales <- str_trim(Sites_prod$Activités.artisanales)

Table 48: Column contents Craft.activities

(a) Before
Var1 Freq
1019
Autre 4
Autre; Fer; Textile 1
Bronze; Fer; Métallurgie 1
Bronze; Fer; Orfévrerie; Tabletterie; Textile 1
Fer 14
Fer; Textile 2
Métallurgie 1
Textile 1
(b) After
Var1 Freq
Autre 4
Bronze 2
Fer 16
Métallurgie 2
Orfévrerie 1
Tabletterie 1
Textile 4
Table 49: The Sites_prod table after separation of multiple values
SITE_SOURCE_ID SITE_NAME MAIN_CITY_NAME GEONAME_ID PROJECTION_SYSTEM LONGITUDE LATITUDE ALTITUDE CITY_CENTROID OCCUPATION STATE_OF_KNOWLEDGE COMMENTS WEB_IMAGES BIBLIOGRAPHY Activités.artisanales Type.de.site STARTING_PERIOD ENDING_PERIOD CARAC_NAME
SIT0006 Kegelriß Ehrenkirchen-Ehrenstetten NA 4326 7.776640 47.89846 NA Oui Unique Prospecté pédestre NA NA Hofmann 1999, 79 Weber-Jenisch 1995, 138-139 Burkhardt 1998 Féliu 2008 Fer Dépôt; Habitat indéterminé -150:-76 -150:-76 Production
SIT0007 Limberg Sasbach am Kaiserstuhl NA 4326 7.603129 48.15083 NA Oui Non renseigné Non renseigné NA NA Hofmann 1999, 101 Weber-Jenisch 1995 ArkeoGIS v3 D. Mischka 2007 Féliu 2008 Fer Habitat indéterminé; Site fortifié -150:-26 -150:-26 Production
SIT0083 Mont-Châtel Boviolles NA 4326 5.401720 48.63913 NA Non Unique Fouillé NA NA Bronze Habitat de hauteur; Site fortifié -150:-76 -25:14 Production
SIT0083 Mont-Châtel Boviolles NA 4326 5.401720 48.63913 NA Non Unique Fouillé NA NA Fer Habitat de hauteur; Site fortifié -150:-76 -25:14 Production
SIT0083 Mont-Châtel Boviolles NA 4326 5.401720 48.63913 NA Non Unique Fouillé NA NA Orfévrerie Habitat de hauteur; Site fortifié -150:-76 -25:14 Production
SIT0083 Mont-Châtel Boviolles NA 4326 5.401720 48.63913 NA Non Unique Fouillé NA NA Tabletterie Habitat de hauteur; Site fortifié -150:-76 -25:14 Production


As with the site types, the number of different productions described is not so large as to require full automation, and some entries (such as Iron) may apply to more than one line of the thesaurus. Therefore, it seems simpler to do the alignment manually. For all activities, the column CARAC_LVL4 remains empty, so it is not changed.

# Retrieve in a table the different types of activities

prod2 <- as.data.frame(table(Sites_prod$Activités.artisanales))

# Add empty characterisation columns to this table

prod2[,col] <- NA

# For each level of characterisation indicate the corresponding value in ArkeoGIS

prod2$CARAC_LVL1 <- prod2$Var1 %>%
  str_replace_all(c(
    "Autre|Tabletterie|Textile" = "Autres",
    "Bronze|Fer|Métallurgie|Orfévrerie" = "Métal"
  ))

prod2$CARAC_LVL2 <- prod2$Var1 %>%
  str_replace_all(c(
    "Autre|Tabletterie|Textile" = NA,
    "Bronze|Fer|Métallurgie|Orfévrerie" = "Forge"
  ))

prod2$CARAC_LVL3 <- prod2$Var1 %>%
  str_replace_all(c(
    "Autre|Tabletterie|Textile" = NA,
    "Bronze" = "Bronze",
    "Fer" = "Fer",
    "Métallurgie" = NA,
    "Orfévrerie" = "Or"
  ))
# delete the Freq column created by the table function used at the beginning

prod2 <- prod2[,-2]
Table 50: Alignment of production types with the ArkeoGIS thesaurus
Var1 CARAC_LVL1 CARAC_LVL2 CARAC_LVL3 CARAC_LVL4
Autre Autres NA NA NA
Bronze Métal Forge Bronze NA
Fer Métal Forge Fer NA
Métallurgie Métal Forge NA NA
Orfévrerie Métal Forge Or NA
Tabletterie Autres NA NA NA
Textile Autres NA NA NA
# Add characterisation columns for each corresponding site

Sites_prod <- droplevels(merge(Sites_prod, prod2, 
                               by.x = "Activités.artisanales", 
                               by.y = "Var1"))


As before, the columns Activités.artisanales and BIBLIOGRAPHY must be deleted, while the column CARAC_EXP is created. Then the table is rearranged according to the ArkeoGIS model.

# Delete and add the necessary columns

Sites_prod <- Sites_prod[,-c(1, 16)]
Sites_prod$CARAC_EXP <- "Non"

# Reordering the table

Sites_prod <- Sites_prod[, colnames(arkeoGIS)]
Table 51: Adding the ArkeoGIS thesaurus to the Sites_prod table
SITE_SOURCE_ID SITE_NAME MAIN_CITY_NAME GEONAME_ID PROJECTION_SYSTEM LONGITUDE LATITUDE ALTITUDE CITY_CENTROID OCCUPATION STATE_OF_KNOWLEDGE STARTING_PERIOD ENDING_PERIOD CARAC_NAME CARAC_LVL1 CARAC_LVL2 CARAC_LVL3 CARAC_LVL4 CARAC_EXP BIBLIOGRAPHY COMMENTS WEB_IMAGES
7 SIT0006 Kegelriß Ehrenkirchen-Ehrenstetten NA 4326 7.776640 47.89846 NA Oui Unique Prospecté pédestre -150:-76 -150:-76 Production Métal Forge Fer NA Non Hofmann 1999, 79 Weber-Jenisch 1995, 138-139 Burkhardt 1998 Féliu 2008 NA NA
8 SIT0007 Limberg Sasbach am Kaiserstuhl NA 4326 7.603129 48.15083 NA Oui Non renseigné Non renseigné -150:-26 -150:-26 Production Métal Forge Fer NA Non Hofmann 1999, 101 Weber-Jenisch 1995 ArkeoGIS v3 D. Mischka 2007 Féliu 2008 NA NA
6 SIT0083 Mont-Châtel Boviolles NA 4326 5.401720 48.63913 NA Non Unique Fouillé -150:-76 -25:14 Production Métal Forge Bronze NA Non NA NA
10 SIT0083 Mont-Châtel Boviolles NA 4326 5.401720 48.63913 NA Non Unique Fouillé -150:-76 -25:14 Production Métal Forge Fer NA Non NA NA
25 SIT0083 Mont-Châtel Boviolles NA 4326 5.401720 48.63913 NA Non Unique Fouillé -150:-76 -25:14 Production Métal Forge Or NA Non NA NA
26 SIT0083 Mont-Châtel Boviolles NA 4326 5.401720 48.63913 NA Non Unique Fouillé -150:-76 -25:14 Production Autres NA NA NA Non NA NA


5 Assemble the data and export the results


Last, the tables sites2 and Sites_prod are added to the table total and the result is exported in csv format according to the standards requested by ArkeoGIS

# Bringing together the different data sets

total <- rbind(total, sites2, Sites_prod)

# Export the results in csv format

write.csv2(total, "results/Scholtus_2021.csv", sep = ";", na = "", 
          dec = ".", row.names = F, col.names = T, fileEncoding = "UTF-8")
Warning

Before starting the import into ArkeoGIS, it is necessary to check that the csv corresponds to the requested format, in particular for the decimal separator.



We used R version 4.2.2 (R Core Team 2022) and the following R packages: grateful v. 0.1.11 (Rodríguez-Sánchez, Jackson, and Hutchins 2022), kableExtra v. 1.3.4 (Zhu 2021), plyr v. 1.8.7 (Wickham 2011), rlist v. 0.4.6.2 (Ren 2021), rmarkdown v. 2.14 (Xie, Allaire, and Grolemund 2018; Xie, Dervieux, and Riederer 2020; Allaire et al. 2022), tidyverse v. 1.3.1 (Wickham et al. 2019).

This tutorial was translated from the French using the DeepL website

References

Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2022. Rmarkdown: Dynamic Documents for r. https://github.com/rstudio/rmarkdown.
Barral, Philippe, and Stephan Fichtl, eds. 2012. Regards sur la chronologie de la fin de l’âge du Fer (IIIe - Ier siècle avant J.-C.) en Gaule non méditerranéenne : actes de la table ronde tenue à Bibracte, Glux-en-Glenne, 15 - 17 octobre 2007 ; Table Ronde sur la Chronologie de la Fin de l’âge du Fer dans l’Est de la France. Collection Bibracte 22. Glux-en-Glenne: Bibracte, Centre Archéologique Européen.
Bernard, Loup, Lizzie Scholtus, and Philippe JULLIEN. 2022. “Thésaurus "Immobilier" d’ArkeoGIS-ArkeOpen.” NAKALA - https://nakala.fr (Huma-Num - CNRS). https://doi.org/10.34847/NKL.A7654QJ6.
Bernard, Loup, Lizzie Scholtus, and Philippe Jullien. 2022a. “Thésaurus "Mobilier" d’ArkeoGIS-ArkeOpen.” NAKALA - https://nakala.fr (Huma-Num - CNRS). https://doi.org/10.34847/NKL.510A974O.
———. 2022b. “Thésaurus "Production" d’ArkeoGIS-ArkeOpen.” NAKALA - https://nakala.fr (Huma-Num - CNRS). https://doi.org/10.34847/NKL.3F9BRL3C.
———. 2023. “Chronologie "Europe Occidentale Continentale Centrale Du Néolithique Récent à Nos Jours".” NAKALA - https://nakala.fr (Huma-Num - CNRS). https://doi.org/10.34847/NKL.1C4EMDQ6.
Marlet, Olivier, Bruno Baudoin, Loup Bernard, Laure Bézard, Romain Boissat, Pierre-Yves Buard, Agnieszka Halczuk, et al. 2022. Le Livre Blanc du consortium Mémoires des Archéologues et des Sites Archéologiques : Guide des bonnes pratiques numériques en archéologie. https://halshs.archives-ouvertes.fr/halshs-03561376.
R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Ren, Kun. 2021. Rlist: A Toolbox for Non-Tabular Data Manipulation. https://CRAN.R-project.org/package=rlist.
Rodríguez-Sánchez, Francisco, Connor P. Jackson, and Shaurita D. Hutchins. 2022. Grateful: Facilitate Citation of r Packages. https://github.com/Pakillo/grateful.
Scholtus, Lizzie. 2021. “Les Faciès Culturels Dans l’est de La Gaule Entre Le IIIe s. Av. J.-C. Et Le IVe s. Apr. J.-C. : Essai de Modélisation.” Theses, Strasbourg: Université de Strasbourg. https://tel.archives-ouvertes.fr/tel-03506297.
Wickham, Hadley. 2011. “The Split-Apply-Combine Strategy for Data Analysis.” Journal of Statistical Software 40 (1): 1–29. https://www.jstatsoft.org/v40/i01/.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.
Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.
Zhu, Hao. 2021. kableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.

Footnotes

  1. Matching this chronology will display the data in ArkeoGIS in the colour corresponding to their dating.↩︎

  2. This section will be modified as the thesaurus is being cleaned up. In particular, the “Others” entries will disappear. For this reason, we have chosen not to align with the “Others” level, but with the previous one.↩︎