Docker_Project
├── custom-base-r
│ ├── 𝘪𝘯𝘴𝘵𝘢𝘭𝘭_𝘱𝘢𝘤𝘬𝘢𝘨𝘦𝘴.𝘙
│ └── 𝘋𝘰𝘤𝘬𝘦𝘳𝘧𝘪𝘭𝘦
└── fish_suitability
├── Data
│ ├── Processed
│ │ └── Parameters
│ │ └── 𝘣𝘢𝘵𝘩𝘺𝘮𝘦𝘵𝘳𝘺_𝘮𝘦𝘢𝘯.𝘵𝘪𝘧
│ └── Raw
│ └── Parameters
├── Models
│ └── 𝘸𝘰𝘳𝘬𝘧𝘭𝘰𝘸_𝘸𝘪𝘵𝘩_𝘮𝘦𝘵𝘢𝘥𝘢𝘵𝘢.𝘳𝘥𝘴
├── Scripts
│ └── 𝘨𝘦𝘵_𝘮𝘰𝘥𝘦𝘭_𝘱𝘳𝘦𝘥𝘪𝘤𝘵𝘪𝘰𝘯𝘴.𝘙
├── Outputs
├── 𝘋𝘰𝘤𝘬𝘦𝘳𝘧𝘪𝘭𝘦
└── 𝘳𝘦𝘲𝘶𝘪𝘳𝘦𝘮𝘦𝘯𝘵𝘴.𝘵𝘹𝘵
8.1 Application Packaging
At the end of the Machine Learning section, we saved the trained model as an RDS file, which contains the tidymodels
workflow and its associated metadata. To generate predictions, we need to create an R script that will download and process the input data, pass it through the model, and save the predictions as NetCDF files. By combining this script with the model, we can package them together for deployment, allowing the process to run autonomously in any environment. This can be achieved using Docker for seamless, isolated execution.
Make sure you have the following directory structure to create the Docker image.The custom-base-r will be used to generate the base image while fish_suitability will be used to build the final image.
Get Model Predictions R Script
To retrieve predictions from the model, we must first download the required input data. Since our model was trained using the reanalysis ocean physics and biogeochemistry products from CMEMS, we will use the corresponding forecast variables. The process will be similar to the code used in the Handling CMEMS Data section. Although not strictly necessary, it’s helpful to insert print statements between code blocks in the script. These can provide progress updates during the execution of the Docker container or assist in debugging if the execution fails.
Create CMEMS Data Directory
In the code below, we have created a directory of the variables to download from CMEMS, based on the initial set of variables we used for model fitting. As you may have noticed, there are two versions of each variable: NWS and IBI. These refer to the Northwest Shelf (NWS) and Iberian-Biscay-Irish (IBI) products, respectively. We have included both products because the spatial coverage of one product alone is insufficient. Additionally, we’ve specified the minimum date for each variable, which depends on the lag days selected for each variable. We also indicate whether we need surface, seafloor, or multilevel (to be converted to seafloor) data. The reason for listing all of this information is to ensure that the code can be reused even if the model’s selected variables change during feature selection.
cat("Downloading input data from CMEMS.\n")
<- "YourUsername" # Specify CMEMS username
USERNAME <- "YourPassword" # Specify CMEMS password
PASSWORD
<- paste0("\"", # Specificy the folder to save the NetCDF files
out_dir getwd(),
"/Data/Raw/Parameters",
"\"")
#Time coverage
= Sys.Date() + 10 # Specify dates for forecasting
date_max
library(dplyr)
<- tribble(
CMEMS ~variable, ~common_name, ~out_name, ~lon, ~lat, ~depth, ~z_levels, ~date_min, ~datasetID,
"bottomT", "bottomT", "bottomT_NWS", c(-15, 13), c(43, 62), c("0.5","0.5"), "one", Sys.Date()-14, "cmems_mod_nws_phy_anfc_0.027deg-3D_P1D-m",
"zos", "ssh", "ssh_NWS", c(-15, 13), c(43, 62), c("0.5","0.5"), "one", Sys.Date()-2, "cmems_mod_nws_phy_anfc_0.027deg-3D_P1D-m",
"nppv", "pp", "pp_NWS", c(-15, 13), c(43, 62), c("0.5","0.5"), "one", Sys.Date()-60, "cmems_mod_nws_bgc_anfc_0.027deg-3D_P1D-m",
"uo", "current", "current_u_NWS", c(-15, 13), c(43, 62), c("0.5","0.5"), "one", Sys.Date()-2, "cmems_mod_nws_phy_anfc_0.027deg-3D_P1D-m",
"vo", "current", "current_v_NWS", c(-15, 13), c(43, 62), c("0.5","0.5"), "one", Sys.Date()-2, "cmems_mod_nws_phy_anfc_0.027deg-3D_P1D-m",
"so", "salinity", "salinity_NWS", c(-15, 13), c(43, 62), c("0.","1000."), "multi", Sys.Date()-14, "cmems_mod_nws_phy_anfc_0.027deg-3D_P1D-m",
"o2", "DO", "DO_NWS", c(-15, 13), c(43, 62), c("0.","1000."), "multi", Sys.Date()-14, "cmems_mod_nws_bgc_anfc_0.027deg-3D_P1D-m",
"ph", "pH", "pH_NWS", c(-15, 13), c(43, 62), c("0.","1000."), "multi", Sys.Date()-14, "cmems_mod_nws_bgc_anfc_0.027deg-3D_P1D-m",
"VHM0", "wave_mean", "wave_NWS", c(-15, 13), c(43, 62), c("0.5","0.5"), "one", Sys.Date()-2, "cmems_mod_nws_wav_anfc_0.027deg_PT1H-i",
"bottomT", "bottomT", "bottomT_IBI", c(-9.9,-0.4), c(42.5,47), c("0.5","0.5"), "one", Sys.Date()-14, "cmems_mod_ibi_phy_anfc_0.027deg-3D_P1D-m",
"zos", "ssh", "ssh_IBI", c(-9.9,-0.4), c(42.5,47), c("0.5","0.5"), "one", Sys.Date()-2, "cmems_mod_ibi_phy_anfc_0.027deg-3D_P1D-m",
"nppv", "pp", "pp_IBI", c(-9.9,-0.4), c(42.5,47), c("0.5","0.5"), "one", Sys.Date()-60, "cmems_mod_ibi_bgc_anfc_0.027deg-3D_P1D-m",
"uo", "current", "current_u_IBI", c(-9.9,-0.4), c(42.5,47), c("0.5","0.5"), "one", Sys.Date()-2, "cmems_mod_ibi_phy_anfc_0.027deg-3D_P1D-m",
"vo", "current", "current_v_IBI", c(-9.9,-0.4), c(42.5,47), c("0.5","0.5"), "one", Sys.Date()-2, "cmems_mod_ibi_phy_anfc_0.027deg-3D_P1D-m",
"so", "salinity", "salinity_IBI", c(-9.9,-0.4), c(42.5,47), c("0.","1000."), "multi", Sys.Date()-14, "cmems_mod_ibi_phy_anfc_0.027deg-3D_P1D-m",
"o2", "DO", "DO_IBI", c(-9.9,-0.4), c(42.5,47), c("0.","1000."), "multi", Sys.Date()-14, "cmems_mod_ibi_bgc_anfc_0.027deg-3D_P1D-m",
"ph", "pH", "pH_IBI", c(-9.9,-0.4), c(42.5,47), c("0.","1000."), "multi", Sys.Date()-14, "cmems_mod_ibi_bgc_anfc_0.027deg-3D_P1D-m",
"VHM0", "wave_mean", "wave_IBI", c(-9.9,-0.4), c(42.5,47), c("0.5","0.5"), "one", Sys.Date()-2, "cmems_mod_ibi_wav_anfc_0.027deg_PT1H-i"
)
Download CMEMS Data
Remember that we have also saved a list of predictors in the model’s metadata (see this subsection). We can extract this list from the metadata and use it to filter the CMEMS directory, ensuring that only the necessary variables are downloaded. This approach prevents the retrieval of unnecessary CMEMS data, optimizing storage and processing efficiency.
<- readRDS("./Models/workflow_with_metadata.rds") # read trained model and its metadata
trained_workflow
<- unlist(trained_workflow[["metadata"]][["error"]]$Predictors) # retrieve the predictors of the model
predictors
<- predictors %>% # remove prefix labels (e.g., lag1_, lag2_, lag7_)
CMEMS_var sub("^[^_]*[0-9]+_", "",.)
if("current" %in% CMEMS_var){ # add current_u and current_v to the list if current is one of the predictors
<- c(CMEMS_var,"current_u","current_v")
CMEMS_var
}
<- unique(CMEMS_var)
CMEMS_var <- CMEMS_var[!grepl("time_trend|depth",CMEMS_var)] # remove non-CMEMS variables
CMEMS_var
<- CMEMS %>%
CMEMS filter(common_name %in% CMEMS_var)
for(i in 1:nrow(CMEMS)){
<- paste0("copernicusmarine subset",
command " --username ",USERNAME,
" --password ",PASSWORD,
" -i ",CMEMS[i,]$datasetID,
" -x ",unlist(CMEMS[i,]$lon)[1]," -X ",unlist(CMEMS[i,]$lon)[2],
" -y ",unlist(CMEMS[i,]$lat)[1]," -Y ",unlist(CMEMS[i,]$lat)[2],
" -z ",unlist(CMEMS[i,]$depth)[1]," -Z ",unlist(CMEMS[i,]$depth)[2],
" -t ",CMEMS[i,]$date_min," -T ",date_max,
" -v ",CMEMS[i,]$variable,
" -o ",out_dir,
" -f ",CMEMS[i,]$out_name,".nc")
system(command, input="Y")
}
Process CMEMS Data
Next, we need to process the downloaded CMEMS data following the steps outlined in this subsection. We need to have a template raster that will be converted to spatial points. These points will be used to extract the values from the CMEMS data and pass it to the model. The predictions, which are equidistant from each other, will be the centroids of the prediction raster. Here, we are using a bathymetry raster as a template.
cat("Processing CMEM input data.\n")
library(terra)
library(lubridate)
library(dplyr)
library(stringr)
library(tidyr)
library(magrittr)
#create function to process the CMEMS data
<- function(x){ # x is a SpatRaster object (several depth levels in each time step)
ValueAtDeepest
<- unique(terra::time(x)) # get unique time DateTime
time <- terra::ext(x) # get extent of the SpatRaster
extent = terra::as.data.frame(x, xy = TRUE, na.rm=FALSE) # convert SpatRaster into a data frame, values in each depth are separated in columns
x
<- x %>%
x ::drop_na(contains("_depth=0")) %>% # remove rows that have NA at the surface (i.e, NA at the surface means no values at any depth level)
tidyr::rename_with(., ~ str_extract(.x,"(?<=\\=)\\d+"),
dplyrcontains("_depth")) %>%
::pivot_longer(cols = -c(x,y), # pivot into long format, all parameter values in one column
tidyrnames_to = "depth",
values_to = "z") %>%
::mutate(depth = as.numeric(depth)) %>%
dplyr::group_by(x,y) %>% # group by grid cell
dplyr::drop_na() %>% # remove NAs
tidyr::arrange(desc(depth), .by_group=TRUE) %>% # arrange depth from deepest to shallowest
dplyr::slice_head() %>% # only extract the row with the deepest level
dplyr::select(-depth)
dplyr
<- terra::rast(x, type="xyz", crs="epsg:4326") # convert xyz dataframe into a SpatRaster object
x ::time(x) <- time # label the DateTime based from the original SpatRaster
terranames(x) <- time
<- terra::extend(x,extent) # extend extent based on the original SpatRaster
x
return(x)
}
<- rast("./Data/Processed/Parameters/bathymetry_mean.tif") # import depth raster
meanDepth names(meanDepth) <- "depth"
<- as.data.frame(meanDepth, xy=TRUE) # convert depth SpatRaster to a dataframe. Each centroid of the raster becomes a point
template_data <- template_data %>% # convert dataframe to SpatVector
template_data vect(geom=c("x","y"), crs="epsg:4326")
<- list.files(path = "./Data/Raw/Parameter", # list file paths of downloaded CMEMS data
pathList pattern='.nc$', all.files=TRUE, full.names=TRUE)
<- sub("\\.nc","",basename(pathList)) # extract names
names names(pathList) <- names # create names for the list elements
<- vector(mode="list", length=length(pathList)) # create list that will contain the processed raster
rastList names(rastList) <- names # each element corresponds to each processed environmental variable
<- CMEMS %>% # create a vector for surface variables
one_level filter(z_levels=="one") %>%
pull(out_name)
for(i in 1:length(pathList)){
cat(i,"/",length(pathList),": Processing ", names(pathList[i]),"\n", sep="")
<- names(rastList[i])
var
if(var %in% one_level){
if(grepl("wave",var)){
<- rast(pathList[i])
raster ::time(raster) <- as.POSIXct(substring(terra::time(raster),1,10), # remove time part but retain the date
terratz="UTC")
<- unique(terra::time(raster))
dates <- vector(mode="list", length=length(dates))
temp_list
for(j in 1:length(dates)){
<- mean(raster[[terra::time(raster)==dates[j]]]) # compute mean wave height per day (raw data is hourly)
temp_list[[j]] names(temp_list[[j]]) <- gsub("_","_mean_",names[i])
::time(temp_list[[j]]) <- dates[j]
terra
}
<- rast(temp_list)
rastList[[i]]
else {
} <- rast(pathList[[var]]) # open other surface variables as SpatRasters
rastList[[var]] names(rastList[[var]]) <- rep(var, times=terra::nlyr(rastList[[var]]))
}
else{ # extract seafloor values for multi-level variables
} <- rast(pathList[i])
raster <- unique(terra::time(raster))
dates <- vector(mode="list", length=length(dates))
temp_list
for(j in 1:length(dates)){
<- raster[[terra::time(raster)==dates[j]]]
subRaster <- as.data.frame(subRaster)
check if(nrow(check)==0){break}
<- ValueAtDeepest(subRaster)
temp_list[[j]] names(temp_list[[j]]) <- names[i]
}<- rast(temp_list)
rastList[[i]]
} }
Build Input Dataframes
Now that the CMEMS forecast data has been processed, we need to construct the daily input dataframes for the model to generate predictions. The number of lagged days for each predictor can be adjusted based on the selection used during model training.
In the final stages of the code, new variables will be created as described in the Create New Predictors subsection under Data Exploration. The coalesce()
function will consolidate similar variables into a single column (e.g., merging bottomT_NWS
and bottomT_IBI
). Additionally, the repeated if
statements in the latter part of the pipeline ensure that new predictors are generated only if they are present in the extracted dataframe.
cat("Building input dataframe.\n")
rm(list=setdiff(ls(),c("trained_workflow","predictors","template_data",
"rastList")))
<- as.POSIXct(as.character(Sys.Date()),tz="UTC") # set reference date to today
date_min <- lapply(rastList,terra::time) %>% # set the latest forecast date based on the shortest available CMEMS forecast
date_max lapply(., max) %>%
do.call("c",.) %>%
min()
<- seq(date_min, date_max, by = "day") # specify prediction dates
dates_seq <- vector(mode="list", length=length(dates_seq)) # create list that will contain the the input dataframe per day
df_list
for(i in 1:length(dates_seq)){ # create input dataframe for each forecast date looping through the prediction dates
<- template_data # this template data comes from the bathymetry raster
newdata cat("Date:",as.character(dates_seq[i]),"\n")
for(j in 1:length(rastList)){
<- rastList[[j]]
raster <- dates_seq[i]
date cat(" Extracting ",unique(names(raster))," values.\n",sep="")
<- raster[[terra::time(raster)==date]]
subRaster <- terra::extract(subRaster,newdata,bind=TRUE)
newdata
if(grepl(x=unique(names(raster)), # if the variables being processed are current, wave or ssh, then extract the following day lags: 1 and 2
pattern = "current_u|current_v|wave_mean|ssh")){
<- date - lubridate::days(1)
date <- raster[[terra::time(raster)==date]]
subRaster names(subRaster) <- paste0("lag1_",names(subRaster))
<- terra::extract(subRaster,newdata,bind=TRUE)
newdata
gc()
<- date - lubridate::days(1)
date <- raster[[terra::time(raster)==date]]
subRaster names(subRaster) <- paste0("lag2_",names(subRaster))
<- terra::extract(subRaster,newdata,bind=TRUE)
newdata
gc()
else if(grepl(x=unique(names(raster)), # if the variables being processed pp, then extract the following day lags: 30 and 60
} pattern = "pp")){
<- date - lubridate::days(30)
date <- raster[[terra::time(raster)==date]]
subRaster names(subRaster) <- paste0("lag30_",names(subRaster))
<- terra::extract(subRaster,newdata,bind=TRUE)
newdata
gc()
<- date - lubridate::days(30)
date <- raster[[terra::time(raster)==date]]
subRaster names(subRaster) <- paste0("lag60_",names(subRaster))
<- terra::extract(subRaster,newdata,bind=TRUE)
newdata
gc()
else{ # if the variables being processed are salinity, bottomT, pH or DO, then extract the following day lags: 7 and 14
}
<- date - lubridate::days(7)
date <- raster[[terra::time(raster)==date]]
subRaster names(subRaster) <- paste0("lag7_",names(subRaster))
<- terra::extract(subRaster,newdata,bind=TRUE)
newdata
gc()
<- date - lubridate::days(7)
date <- raster[[terra::time(raster)==date]]
subRaster names(subRaster) <- paste0("lag14_",names(subRaster))
<- terra::extract(subRaster,newdata,bind=TRUE)
newdata
gc()
}
}
rm(list=c("raster","subRaster"))
<- as.data.frame(newdata, geom="XY") %>%
newdata mutate(Date = dates_seq[i],
time_trend = as.numeric(Date - min(as.POSIXct("2006-01-11 00:00:00",tz="UTC")))/1e9,
Longitude = x,
Latitude = y,
across(ends_with("_NWS"),
~ coalesce(., get(sub("_NWS", "_IBI", cur_column()))),
.names = "{gsub('_NWS','', .col)}")) %>%
select(-ends_with(c("_NWS", "_IBI"))) %>%
if(any(str_detect(predictors,"current"))) mutate(.,current = sqrt(current_u^2 + current_v^2),
{lag1_current = sqrt(lag1_current_u^2 + lag1_current_v^2),
lag2_current = sqrt(lag2_current_u^2 + lag2_current_v^2),
delta1_current = current - lag1_current,
delta2_current = current - lag2_current,
delta1_current_u = current_u - lag1_current_u,
delta2_current_u = current_u - lag2_current_u,
delta1_current_v = current_v - lag1_current_v,
delta2_current_v = current_v - lag2_current_v) else .} %>%
if(any(str_detect(predictors,"ssh"))) mutate(.,delta1_ssh = ssh - lag1_ssh,
{delta2_ssh = ssh - lag2_ssh) else .} %>%
if(any(str_detect(predictors,"wave_mean"))) mutate(.,delta1_wave_mean = wave_mean - lag1_wave_mean,
{delta2_wave_mean = wave_mean - lag2_wave_mean) else .} %>%
if(any(str_detect(predictors,"DO"))) mutate(.,delta7_DO = DO - lag7_DO,
{delta14_DO = DO - lag14_DO) else .} %>%
if(any(str_detect(predictors,"pH"))) mutate(.,delta7_pH = pH - lag7_pH,
{delta14_pH = pH - lag14_pH) else .} %>%
if(any(str_detect(predictors,"pp"))) mutate(.,delta30_pp = pp - lag30_pp,
{delta60_pp = pp - lag60_pp) else .} %>%
if(any(str_detect(predictors,"salinity"))) mutate(.,delta7_salinity = salinity - lag7_salinity,
{delta14_salinity = salinity - lag14_salinity) else .} %>%
if(any(str_detect(predictors,"bottomT"))) mutate(.,delta7_bottomT = bottomT -lag7_bottomT,
{delta14_bottomT = bottomT - lag14_bottomT) else .} %>%
select(Date,Longitude,Latitude,any_of(predictors)) %>%
drop_na()
<- newdata
df_list[[i]]
gc()
}
Save Predictions into a NetCDF file
Finally, we can use the dataframes created from the forecasted CMEMS data as input for the model. Each day of predictions will be saved as a separate layer in a NetCDF file. Since we have enriched the model with metadata, we can also include this metadata as additional information when writing the NetCDF file, ensuring better documentation and traceability of the model predictions.
cat("Getting model predictions.\n")
library(workflows)
library(ranger)
library(terra)
<- trained_workflow[["metadata"]][["model_metrics"]] %>%
error filter(.metric=="pr_auc" & test_type=="test_time")
<- "SOL"
FAO_code <- "Common Sole"
common_name <- "Solea solea"
scientific_name <- paste0("title=Daily Predictions of Fishing Suitability of ",common_name," (",scientific_name,")")
title <- paste0("institution=",trained_workflow[["metadata"]][["institute"]])
institution <- paste0("source=",trained_workflow[["metadata"]][["model"]])
source <- paste0("comment=",trained_workflow[["metadata"]][["training_data"]]," ",
comment "metadata"]][["notes"]]," ",
trained_workflow[[$test_type_info,", yielding an ",tolower(error$.metric_info), " of ",
errorround(error$.estimate,3),". The model performance was evaluated on ",
format(trained_workflow[["metadata"]][["creation_date"]],"%d-%b-%Y"),".")
<- paste0("contact=",trained_workflow[["metadata"]][["contact"]])
contact <- paste0("library=",trained_workflow[["metadata"]][["library"]])
library
<- trained_workflow[["workflow"]]
model
<- vector(mode="list", length=length(dates_seq)) #list that will contain the raster predictions
pred_list
for(i in 1:length(df_list)){
<- df_list[[i]]
df
if(nrow(df)==0){next}
<- predict(model,df) %>%
predictions select(.pred) %>%
cbind(df,.) %>%
mutate(.pred = .pred*100) %>%
rename(fishing_suitability = .pred,
x = Longitude,
y = Latitude) %>%
select(x,y,fishing_suitability) %>%
rast(.,type="xyz", crs="EPSG:4326") #convert to raster
::time(predictions) <- unique(df$Date)
terra
<- predictions
pred_list[[j]]
}
<- rast(pred_list) #bind all rasters into one object
raster
::writeCDF(raster, #export as NetCDF file
terraoverwrite = TRUE,
varname = paste0(FAO_code,"_predictions"),
longname = paste0("Predicted Fishing Suitability of ",scientific_name),
unit = "%",
atts = c(title,institution,source,comment,contact,library),
filename = paste0("./Outputs/",FAO_code,"_predictions_",Sys.Date(),".nc"))
We can then put all of the codes above into one R script. Let’s name it get_model_predictions.R
and save it in the Scripts folder as indicated here.
Containerize the Model into a Docker Image
We have mentioned Docker a couple of times in this section, but what is Docker? Docker is an open-source platform that allows developers to build, package, and run applications in lightweight, portable containers. These containers include everything an application needs to run, such as libraries, dependencies, and configuration files, ensuring that it works consistently across different environments.
How does it work? Developers write code and define dependencies in a Dockerfile. Then, Docker builds an image based on the Dockerfile. The image is then stored in a registry like Docker Hub or a private repository. Lastly, containers are created from the image and run on any Docker-compatible system.
In our use case, we are using Windows VM running Windows Server 2022 DataCenter. In this virtual machine, specific installaions are required for Docker to run. This is well documented in this link. The two important installations are the following.
Install the Windows Server 2022 containers feature
Install Hyper-V in Windows Server 2022
Also, Docker Desktop for Windows needs to be installed in this server. The process will also require the installation of Windows SubSystem for Linux (WSL). Once the server is correctly setup and installed we can move on to the next part which is creating the Docker images and running the container.
Creating the Docker Image
Since the model runs in an R environment, we first need to create an image based on R-base. We are using the package known as rocker/r-base.
Create the Image custom-base-r
We first need to create a Dockerfile. It is a file that contains a set of instructions for building a Docker image. It defines the steps to create a custom Docker image by specifying the base image, dependencies, configuration, and commands that need to be executed within the image. The Dockerfile is like a blueprint or recipe for creating a Docker container environment. To create a Dockerfile, open Notepad++ and paste the following codes. Then, save the file with the name Dockerfile
without file extension. Make sure the file type is set to “All types”.
FROM rocker/r-base:latest
RUN apt-get update
RUN apt-get -y install build-essential xml2 openssl nano
RUN apt-get -y install libfontconfig1-dev
RUN apt-get -y install libgdal-dev libgeos-dev libproj-dev libharfbuzz-dev libfribidi-dev
RUN apt-get update
COPY Scripts/install_packages.R /install_packages.R
RUN Rscript /install_packages.R
We also need a separate R script to install the necessary R packages for the get_model_predictions.R
to work. Save this in the Scripts folder as install_packages.R
.
options(repos = "https://cloud.r-project.org") # setting a CRAN mirror
install.packages(c("stats"), repos = "https://cloud.r-project.org")
install.packages(c("graphics"), repos = "https://cloud.r-project.org")
install.packages(c("grDevices"), repos = "https://cloud.r-project.org")
install.packages(c("datasets"), repos = "https://cloud.r-project.org")
install.packages(c("methods"), repos = "https://cloud.r-project.org")
install.packages(c("base"), repos = "https://cloud.r-project.org")
install.packages(c("ranger"), repos = "https://cloud.r-project.org")
install.packages(c("lightgbm"), repos = "https://cloud.r-project.org")
install.packages(c("workflows"), repos = "https://cloud.r-project.org")
install.packages(c("magrittr"), repos = "https://cloud.r-project.org")
install.packages(c("tidyr"), repos = "https://cloud.r-project.org")
install.packages(c("stringr"), repos = "https://cloud.r-project.org")
install.packages(c("dplyr"), repos = "https://cloud.r-project.org")
install.packages(c("lubridate"), repos = "https://cloud.r-project.org")
install.packages(c("terra"), repos = "https://cloud.r-project.org")
install.packages(c("tidyselect"), repos = "https://cloud.r-project.org")
install.packages(c("listenv"), repos = "https://cloud.r-project.org")
install.packages(c("purrr"), repos = "https://cloud.r-project.org")
install.packages(c("splines"), repos = "https://cloud.r-project.org")
install.packages(c("lattice"), repos = "https://cloud.r-project.org")
install.packages(c("parsnip"), repos = "https://cloud.r-project.org")
install.packages(c("colorspace"), repos = "https://cloud.r-project.org")
install.packages(c("vctrs"), repos = "https://cloud.r-project.org")
install.packages(c("generics"), repos = "https://cloud.r-project.org")
install.packages(c("utf8"), repos = "https://cloud.r-project.org")
install.packages(c("survival"), repos = "https://cloud.r-project.org")
install.packages(c("prodlim"), repos = "https://cloud.r-project.org")
install.packages(c("rlang"), repos = "https://cloud.r-project.org")
install.packages(c("pillar"), repos = "https://cloud.r-project.org")
install.packages(c("glue"), repos = "https://cloud.r-project.org")
install.packages(c("withr"), repos = "https://cloud.r-project.org")
install.packages(c("lifecycle"), repos = "https://cloud.r-project.org")
install.packages(c("lava"), repos = "https://cloud.r-project.org")
install.packages(c("timeDate"), repos = "https://cloud.r-project.org")
install.packages(c("munsell"), repos = "https://cloud.r-project.org")
install.packages(c("gtable"), repos = "https://cloud.r-project.org")
install.packages(c("future"), repos = "https://cloud.r-project.org")
install.packages(c("recipes"), repos = "https://cloud.r-project.org")
install.packages(c("codetools"), repos = "https://cloud.r-project.org")
install.packages(c("parallel"), repos = "https://cloud.r-project.org")
install.packages(c("class"), repos = "https://cloud.r-project.org")
install.packages(c("fansi"), repos = "https://cloud.r-project.org")
install.packages(c("Rcpp"), repos = "https://cloud.r-project.org")
install.packages(c("scales"), repos = "https://cloud.r-project.org")
install.packages(c("ipred"), repos = "https://cloud.r-project.org")
install.packages(c("jsonlite"), repos = "https://cloud.r-project.org")
install.packages(c("parallelly"), repos = "https://cloud.r-project.org")
install.packages(c("dials"), repos = "https://cloud.r-project.org")
install.packages(c("ggplot2"), repos = "https://cloud.r-project.org")
install.packages(c("digest"), repos = "https://cloud.r-project.org")
install.packages(c("stringi"), repos = "https://cloud.r-project.org")
install.packages(c("bonsai"), repos = "https://cloud.r-project.org")
install.packages(c("ncdf4"), repos = "https://cloud.r-project.org")
install.packages(c("grid"), repos = "https://cloud.r-project.org")
install.packages(c("DiceDesign"), repos = "https://cloud.r-project.org")
install.packages(c("hardhat"), repos = "https://cloud.r-project.org")
install.packages(c("cli"), repos = "https://cloud.r-project.org")
install.packages(c("tools"), repos = "https://cloud.r-project.org")
install.packages(c("tibble"), repos = "https://cloud.r-project.org")
install.packages(c("future.apply"), repos = "https://cloud.r-project.org")
install.packages(c("pkgconfig"), repos = "https://cloud.r-project.org")
install.packages(c("ellipsis"), repos = "https://cloud.r-project.org")
install.packages(c("MASS"), repos = "https://cloud.r-project.org")
install.packages(c("Matrix"), repos = "https://cloud.r-project.org")
install.packages(c("data.table"), repos = "https://cloud.r-project.org")
install.packages(c("timechange"), repos = "https://cloud.r-project.org")
install.packages(c("gower"), repos = "https://cloud.r-project.org")
install.packages(c("rstudioapi"), repos = "https://cloud.r-project.org")
install.packages(c("R6"), repos = "https://cloud.r-project.org")
install.packages(c("globals"), repos = "https://cloud.r-project.org")
install.packages(c("rpart"), repos = "https://cloud.r-project.org")
install.packages(c("nnet"), repos = "https://cloud.r-project.org")
install.packages(c("compiler"), repos = "https://cloud.r-project.org")
install.packages(c("xgboost"), repos = "https://cloud.r-project.org", dependencies=TRUE)
After setting up the directories and creating all necessary files, we can proceed with building the base Docker image. This image will be derived from rocker/r-base
, an official R environment maintained by the Rocker Project. In the Command Prompt, change the directory to the custom-base-r
folder using cd
. Then run the docker
command. The docker build
creates a Docker image from the specified Dockerfile
. The -t custom-base-r
assigns the name custom-base-r to the newly created image. The .
at the end specifies the build context, which is the current directory. This directory should contain the Dockerfile and any other necessary files.
cd "C:\Path\to\your\Docker_Project\custom-base-r"
docker build -t custom-base-r .
Build the Final Docker Image
Once you have built the modified R base image, we can finalize this image for deployment. Make sure that there is another Dockerfile in the fish_suitability folder with the the following codes.
FROM custom-base-r
############################
## PYTHON from pyenv
############################
RUN apt-get update
RUN apt-get install -y git
RUN apt-get update
RUN git clone https://github.com/pyenv/pyenv.git /.pyenv
ENV PYENV_ROOT="/.pyenv"
ENV PATH="/.pyenv/bin:${PATH}"
ARG PYTHON_VERSION=3.11
ENV PYTHON_VERSION ${PYTHON_VERSION}
RUN pyenv install ${PYTHON_VERSION}
RUN git clone https://github.com/pyenv/pyenv-virtualenv.git $PYENV_ROOT/plugins/pyenv-virtualenv
RUN pyenv virtualenv ${PYTHON_VERSION} base && \
pyenv global base
ENV PATH=".pyenv/versions/base/bin:$PATH"
############################
## Model requirements
############################
COPY Scripts /Scripts
COPY Models /Models
COPY Data /Data
COPY Output /Output
############################
## Execution
############################
COPY requirements.txt /requirements.txt
RUN pip install -r /requirements.txt
CMD Rscript /Scripts/get_model_predictions.R
Then navigate to the fish_suitability folder in the Command Prompt and run the following docker command.
cd "C:\Path\to\your\Docker_Project\fish_suitability"
docker build -t fish_suitability .
You can double check if indeed the image was created by executing the following command. The fish_suiability
image should be listed under the column REPOSITORY.
docker images
We can now run a Docker Container from the newly created Docker Image. In order to get the prediction output files (i.e., NetCDF files), we need to specify a directory for these files to be saved.
docker run -it --rm -v /path/to/model/outputs:/Outputs fish_suitability
docker run
- This command is used to run a new container from a Docker image.-i
- Keeps the standard input (stdin
) open so you can interact with the container.-t
- Allocates a pseudo-terminal (TTY), making it possible to interact with the container via the command line.--rm
- This flag removes the container automatically after it stops.-v
- This flag mounts a volume, which means sharing a folder between your local machine and the Docker container./path/to/model/outputs
- This is a directory on your local machine./Outputs
- This is the directory inside the container where the local folder will be accessible.fish_suitability
- This is the Docker image being used to create the container.