20-01-2017
catchunit
- Define unit of catch observationsdteuler
and eulertype
- Temporal discretisation and time stepmsytype
- Stochastic and deterministic reference pointsdo.sd.report
- Perform SD report calculationsreportall
- Report all derived quantitiesoptim.method
- Report all derived quantitiesThis vignette explains basic and more advanced functions of the spict
package. The package is installed from gihtub using the devtools
package:
devtools::install_github("mawp/spict/spict")
installs the stable version of spict
and
devtools::install_github("mawp/spict/spict", ref = "dev")
installs the current development version that has new features, but it is not fully tested yet. When loading the package you are notified which version of the package you have installed:
library(spict)
#> Loading required package: TMB
#> Warning: package 'TMB' was built under R version 3.2.5
#> Welcome to spict_v1.2@abb75ddf6c9447ff255fa11bed8da106bff9f849
The printed version follows the format ver@SHA, where ver is the manually defined version number and SHA refers to a unique commit on github. The content of this vignette pertains to the version printed above that can be found here.
The package contains the catch and index data analysed in Polacheck, Hilborn, and Punt (1993). This data can be loaded by typing
data(pol)
Data on three stocks are contained in this dataset: South Atlantic albacore, northern Namibian hake, and New Zealand rock lobster. Here focus will be on the South Atlantic albacore data. This dataset contains the following
pol$albacore
#> $obsC
#> [1] 15.9 25.7 28.5 23.7 25.0 33.3 28.2 19.7 17.5 19.3 21.6 23.1 22.5 22.5
#> [15] 23.6 29.1 14.4 13.2 28.4 34.6 37.5 25.9 25.3
#>
#> $timeC
#> [1] 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980
#> [15] 1981 1982 1983 1984 1985 1986 1987 1988 1989
#>
#> $obsI
#> [1] 61.89 78.98 55.59 44.61 56.89 38.27 33.84 36.13 41.95 36.63 36.33
#> [12] 38.82 34.32 37.64 34.01 32.16 26.88 36.61 30.07 30.75 23.36 22.36
#> [23] 21.91
#>
#> $timeI
#> [1] 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980
#> [15] 1981 1982 1983 1984 1985 1986 1987 1988 1989
Note that data are structured as a list containing the entries obsC
(catch observations), timeC
(time of catch observations), obsI
(index observations), and timeI
(time of index observations). If times are not specified it is assumed that the first observation is observed at time 1 and then sequentially onward with a time step of one year. It is therefore recommended to always specify observation times.
Each catch observation relates to a time interval. This is specified using dtc
. If dtc
is left unspecified (as is the case here) each catch observation is assumed to cover the time interval until the next catch observation. For this example with annual catches dtc
therefore is
inp <- check.inp(pol$albacore)
inp$dtc
#> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
It is important to specify dtc
if the default assumption is not fulfilled.
The data can be plotted using the command
plotspict.data(pol$albacore)
Note that the number of catch and index observations are given in the respective plot headers. Furthermore, the color of individual points shows when the observation was made and the corresponding colors are shown in the color legend in the top right corner. For illustrative purposes let’s try shifting the data a bit
inpshift <- pol$albacore
inpshift$timeC <- inpshift$timeC + 0.3
inpshift$timeI <- inpshift$timeI + 0.8
plotspict.data(inpshift)
Now the colours show that catches are observed in spring and index in autumn.
There is also a more advanced function for plotting data, which at the same time does some basic model fitting (linear regression) and shows the results
plotspict.ci(pol$albacore)
The two top plots come from plotspict.data
, with the dashed horizontal line representing a guess of MSY. This guess comes from a linear regression between the index and the catch divided by the index (middle row, left). This regression is expected to have a negative slope. A similar plot can be made showing catch versus catch/index (middle row, right) to approximately find the optimal effort (or effort proxy). The proportional increase in the index as a function of catch (bottom row, right) should show primarily positive increases in index at low catches and vice versa. Positive increases in index at large catches could indicate model violations. In the current plot these are not seen.
The model is fitted to data by running
res <- fit.spict(pol$albacore)
Here the call to fit.spict
is wrapped in the system.time
command to check the time spent on the calculations. This is obviously not required, but done here to show that fitting the model only takes a few seconds. The result of the model fit is stored in res
, which can either be plotted using plot
or summarised using summary
.
The results are returned as a list that contains output as well as input. The content of this list is
names(res)
#> [1] "value" "sd" "cov"
#> [4] "par.fixed" "cov.fixed" "pdHess"
#> [7] "gradient.fixed" "par.random" "diag.cov.random"
#> [10] "env" "inp" "obj"
#> [13] "opt" "pl" "Cp"
#> [16] "report" "computing.time"
Many of these variables are generated by TMB::sdreport()
. In addition to these spict
includes the list of input values (inp
), the object used for fitting (obj
), the result from the optimiser (opt
), the time spent on fitting the model (computing.time
), and more less useful variables.
The results are summarised using
capture.output(summary(res))
#> [1] "Convergence: 0 MSG: relative convergence (4)"
#> [2] "Objective function at optimum: 2.0654958"
#> [3] "Euler time step (years): 1/16 or 0.0625"
#> [4] "Nobs C: 23, Nobs I1: 23"
#> [5] ""
#> [6] "Priors"
#> [7] " logn ~ dnorm[log(2), 2^2]"
#> [8] " logalpha ~ dnorm[log(1), 2^2]"
#> [9] " logbeta ~ dnorm[log(1), 2^2]"
#> [10] ""
#> [11] "Fixed parameters"
#> [12] " fixed.value "
#> [13] " phi NA "
#> [14] ""
#> [15] "Model parameter estimates w 95% CI "
#> [16] " estimate cilow ciupp log.est "
#> [17] " alpha 8.5381047 1.2232709 59.5936950 2.1445391 "
#> [18] " beta 0.1212590 0.0180688 0.8137626 -2.1098264 "
#> [19] " r 0.2556015 0.1010594 0.6464726 -1.3641356 "
#> [20] " rc 0.7435358 0.1445714 3.8240307 -0.2963383 "
#> [21] " rold 0.8180029 0.0019100 350.3332251 -0.2008894 "
#> [22] " m 22.5827681 17.0681861 29.8790634 3.1171871 "
#> [23] " K 201.4754019 138.1193807 293.8931334 5.3056673 "
#> [24] " q 0.3512548 0.1942689 0.6350989 -1.0462433 "
#> [25] " n 0.6875298 0.0636701 7.4241653 -0.3746501 "
#> [26] " sdb 0.0128136 0.0018406 0.0892015 -4.3572484 "
#> [27] " sdf 0.3673760 0.2673608 0.5048054 -1.0013693 "
#> [28] " sdi 0.1094038 0.0808973 0.1479555 -2.2127093 "
#> [29] " sdc 0.0445477 0.0073370 0.2704792 -3.1111957 "
#> [30] " "
#> [31] "Deterministic reference points (Drp)"
#> [32] " estimate cilow ciupp log.est "
#> [33] " Bmsyd 60.7442629 15.4031099 239.553279 4.1066726 "
#> [34] " Fmsyd 0.3717679 0.0722857 1.912015 -0.9894855 "
#> [35] " MSYd 22.5827681 17.0681861 29.879063 3.1171871 "
#> [36] "Stochastic reference points (Srp)"
#> [37] " estimate cilow ciupp log.est rel.diff.Drp "
#> [38] " Bmsys 60.7366125 15.4032686 239.490475 4.1065467 -1.259603e-04 "
#> [39] " Fmsys 0.3717801 0.0722788 1.912323 -0.9894528 3.276944e-05 "
#> [40] " MSYs 22.5806624 17.0626510 29.883183 3.1170939 -9.325179e-05 "
#> [41] ""
#> [42] "States w 95% CI (inp$msytype: s)"
#> [43] " estimate cilow ciupp log.est "
#> [44] " B_1989.00 59.1917177 31.0255685 112.9281305 4.0807816 "
#> [45] " F_1989.00 0.4160742 0.2048126 0.8452494 -0.8768917 "
#> [46] " B_1989.00/Bmsy 0.9745640 0.3430184 2.7688752 -0.0257651 "
#> [47] " F_1989.00/Fmsy 1.1191406 0.2899282 4.3199506 0.1125611 "
#> [48] ""
#> [49] "Predictions w 95% CI (inp$msytype: s)"
#> [50] " prediction cilow ciupp log.est "
#> [51] " B_1990.00 56.5242669 30.0511479 106.3184926 4.0346700 "
#> [52] " F_1990.00 0.4464499 0.2098831 0.9496596 -0.8064282 "
#> [53] " B_1990.00/Bmsy 0.9306457 0.2932030 2.9539311 -0.0718766 "
#> [54] " F_1990.00/Fmsy 1.2008440 0.2832215 5.0915131 0.1830246 "
#> [55] " Catch_1990.00 24.7359893 15.3328280 39.9058260 3.2082592 "
#> [56] " E(B_inf) 49.9856425 NA NA 3.9117358 "
Here the capture.output()
is only used to provide line numbers for easier reference, but the summary()
command works without this.
logn
. The default priors can be disabled (see the section on priors).sumspict.parest(res)
.sumspict.drefpoints(res)
.sumspict.srefpoints(res)
.B
) and fishing mortality (F
) with the year of the estimates appended. The year is shown as a decimal number as estimates within year are possible. Both absolute (B
and F
) and relative estimates (B/Bmsy
and F/Fmsy
) are shown. The relative estimates are calculated using the type of reference points given by msytype
(line 38), where s
is stochastic and d
is deterministic. Here msytype
is ´s´. This information can be extracted using sumspict.states(res)
.inp$timepredi
, here 1990 (line 47-50). In addition, predicted catch at the time indicated by inp$timepredc
(line 51). Finally, the equilibrium biomass, indicated by E(B_inf), if current conditions remain constant. There predictions or forecasts are calculated under the fishing scenario given by inp$ffac
. See the section on forecasting for more information. The prediction summary can be extracted using sumspict.predictions(res)
.spict
comes with several plotting abilities. The basic plotting of the results is done using the generic function plot
that produces a multipanel plot with the most important outputs.
plot(res)
Some general comments can be made regarding the style and colours of these plots:
The individual plots can be plotted separately using the plotspict.*
family of plotting functions; all functions are summarised in Table 1 and their common arguments that control their look in Table 2:
Function | Plot |
---|---|
Data | |
plotspict.ci |
Basic data plotting (see section ) |
plotspict.data |
Advanced data plotting (see section ) |
Estimates | |
plotspict.bbmsy |
Relative biomass \(B/B_{MSY}\) estimates with uncertainty |
plotspict.biomass |
Absolute (and relative) biomass estimates with uncertainty |
plotspict.btrend |
Expected biomass trend |
plotspict.catch |
Catch data and estimates |
plotspict.f |
Absolute (and relative) fishing mortality \(F\) |
plotspict.fb |
Kobe plot of relative fishing mortality over biomass estimates |
plotspict.ffmsy |
Relative fishing mortality \(F/F_{MSY}\) |
plotspict.priors |
Prior-posterior distribution of all parameters that are estimated using priors |
plotspict.production |
Production over \(B/K\) |
plotspict.season |
Seasonal pattern of fishing mortality \(F\) |
Diagnostics & extras | |
plotspict.diagnostic |
OSA residual analysis to evaluate the fit |
plotspict.osar |
One-step-ahead residual plots, one for data time-series |
plotspict.likprof |
Profile likelihood of one or two parameters |
plotspict.retro |
Retrospective analysis |
plotspict.infl |
Influence statistics of observations |
plotspict.inflsum |
Summary of influence of observations |
plotspict.tc |
Time to \(B_{MSY}\) under different scenarios about \(F\) |
Argument | Value | Result |
---|---|---|
logax |
logical | If TRUE , the y-axis is in log scale |
main |
string | The title of the plot |
ylim |
numeric vector | The limits of the y-axis |
plot.obs |
logical | If TRUE (default) the observations are shown |
qlegend |
logical | If TRUE (default) the color legend is shown |
xlab , ylab |
string | The x and y axes labels |
stamp |
string | Adds a “stamp” at the bottom right corner of the plotting area |
Default is the version and SHA hash of spict . |
||
An empty string removes the stamp. |
We will now look at them one at a time. The top left is the plot of absolute biomass
plotspict.biomass(res)
Note that this plot has a y-axis on the right side related to the relative biomass (\(B_t/B_{MSY}\)). The shaded 95% CI region relates to this axis, while the dashed blue lines relate to the left y-axis indicating absolute levels. The dashed lines and the shaded region are shown on the same plot to make it easier to assess whether the relative or absolute levels are most accurately estimated. Here, the absolute are more accurate than the relative. Later, we will see examples of the opposite. The horizontal black line is the estimate of \(B_{MSY}\) with 95% CI shown as a grey region.
The plot of the relative biomass is produced using
plotspict.bbmsy(res)
This plot contains much of the same information as given by plotspict.biomass
, but without the information about absolute biomass and without the 95% CI around the \(B_{MSY}\) reference point.
The plots of fishing mortality follow the same principles
plotspict.f(res, main='', qlegend=FALSE, rel.axes=FALSE, rel.ci=FALSE)
plotspict.ffmsy(res, main='', qlegend=FALSE)
The estimate of \(F_{MSY}\) is shown with a horizontal black line with 95% CI shown as a grey region (left plot). The 95% CI of \(F_{MSY}\) is very wide in this case. As shown here it is quite straightforward to remove the information about relative levels from the plot of absolute fishing mortality. Furthermore, the argument main=''
removes the heading and qlegend=FALSE
removes the colour legend for data points.
The plot of the catch is produced using
plotspict.catch(res)
This plot shows estimated catches (blue line) versus observed catches (points) with the estimate of \(MSY\) plotted as a horizontal black line with its 95% CI given by the grey region.
A phase plot (or kobe plot) of fishing mortality versus biomass is plotted using
plotspict.fb(res, ylim=c(0, 1.3), xlim=c(0, 300))
The plot shows the development of biomass and fishing mortality since the initial year (here 1967) indicated with a circle until the terminal year (here 1990) indicated with a square. The yellow diamond indicates the mean biomass over a long period if the current (1990) fishing pressure remains. This point can be interpreted as the fished equilibrium and is denoted \(E(B_\infty)\) in the legend as a statistical way of expressing the expectation of the biomass as \(t \rightarrow \infty\). As the current fishing mortality is close to \(F_{MSY}\) the expected long term biomass is close to \(B_{MSY}\).
A vertical dashed red line at \(B_t = 0\) indicates the biomass level below which the stock has crashed. The grey shaded banana-shaped area indicates the 95% confidence region of the pair \(F_{MSY}\), \(B_{MSY}\). This region is important to visualise jointly as the two reference points are highly (negatively) correlated.
Before proceeding with the results for an actual assessment it is very important that the model residuals are checked and possible model deficiencies identified. Residuals can be calculated using calc.osa.resid()
. OSA stands for one-step-ahead, which are the proper residuals for state-space models. More information about OSA residuals is contained in Pedersen and Berg (2016). To calculate and plot residuals and diagnostics do
res <- calc.osa.resid(res)
plotspict.diagnostic(res)
The first column of the plot contains information related to catch data and the second column contains information related to the index data. The rows contain
This data did not have any significant violations of the assumptions, which increases confidence in the results. For a discussion of possible violations and remedies the reader is referred to Pedersen and Berg (2016).
To extract an estimated quantity, here logBmsy
use
get.par('logBmsy', res)
#> ll est ul sd cv
#> logBmsy 2.73458 4.106547 5.478514 0.6999831 0.1704554
This returns a vector with ll
being the lower 95% limit of the CI, est
being the estimated value, ul
being the upper 95% limit of the CI, sd
being the standard deviation of the estimate, and cv
being the coefficient of variation of the estimate. The estimated quantity can also be returned on the natural scale (as opposed to log scale) by running
get.par('logBmsy', res, exp=TRUE)
#> ll est ul sd cv
#> logBmsy 15.40327 60.73661 239.4905 0.6999831 0.7951589
This essentially takes the exponential of ll
, est
and ul
of the values in log, while sd
is unchanged as it is the standard deviation of the quantity on the scale that it is estimated (here log). When transforming using exp=TRUE
the \(CV = \sqrt{e^{\sigma^2}-1}\). Most parameters are log-transformed under estimation and should therefore be extracted using exp=TRUE
.
For a standard fit (not using robust observation error, seasonality etc.), the quantities that can be extracted using this method are
list.quantities(res)
#> [1] "Bmsy" "Bmsy2" "Bmsyd"
#> [4] "Bmsys" "Cp" "Emsy"
#> [7] "Emsy2" "Fmsy" "Fmsyd"
#> [10] "Fmsys" "K" "MSY"
#> [13] "MSYd" "MSYs" "gamma"
#> [16] "isdb2" "isdc2" "isde2"
#> [19] "isdf2" "isdi2" "logB"
#> [22] "logBBmsy" "logBl" "logBlBmsy"
#> [25] "logBlK" "logBmsy" "logBmsyPluslogFmsy"
#> [28] "logBmsyd" "logBmsys" "logBp"
#> [31] "logBpBmsy" "logBpK" "logCp"
#> [34] "logCpred" "logEmsy" "logEmsy2"
#> [37] "logEp" "logF" "logFFmsy"
#> [40] "logFFmsynotS" "logFl" "logFlFmsy"
#> [43] "logFmsy" "logFmsyd" "logFmsys"
#> [46] "logFnotS" "logFp" "logFpFmsy"
#> [49] "logFs" "logIp" "logIpred"
#> [52] "logK" "logMSY" "logMSYd"
#> [55] "logMSYs" "logalpha" "logbeta"
#> [58] "logbkfrac" "logm" "logn"
#> [61] "logq" "logq2" "logr"
#> [64] "logrc" "logrold" "logsdb"
#> [67] "logsdc" "logsdf" "logsdi"
#> [70] "m" "p" "q"
#> [73] "r" "rc" "rold"
#> [76] "sdb" "sdc" "sde"
#> [79] "sdf" "sdi" "seasonsplinefine"
These should be relatively self-explanatory when knowing that reference points ending with s
are stochastic and those ending with d
are deterministic, quantities ending with p
are predictions and quantities ending with l
are estimates in the final year. If a quantity is available both on natural and log scale, it is preferred to transform the quantity from log as most quantities are estimated on the log scale.
The covariance between the model parameters (fixed effects) can be extracted from the results list
res$cov.fixed
#> logm logK logq logn logsdb
#> logm 0.0204039173 -0.005706841 0.026834631 -0.156739851 0.032239971
#> logK -0.0057068412 0.037105159 -0.038075150 -0.011341944 -0.021784068
#> logq 0.0268346313 -0.038075150 0.091311497 -0.227633881 0.040958483
#> logn -0.1567398509 -0.011341944 -0.227633881 1.473734482 -0.190011443
#> logsdb 0.0322399705 -0.021784068 0.040958483 -0.190011443 0.980090447
#> logsdf 0.0014253322 -0.002407442 0.003270286 -0.006648002 -0.002144162
#> logsdi -0.0007603657 -0.002521063 0.002848536 0.007158181 0.010535656
#> logsdc 0.0015534820 0.001300449 -0.008392720 0.001998018 0.137919924
#> logsdf logsdi logsdc
#> logm 0.001425332 -0.0007603657 0.001553482
#> logK -0.002407442 -0.0025210626 0.001300449
#> logq 0.003270286 0.0028485364 -0.008392720
#> logn -0.006648002 0.0071581810 0.001998018
#> logsdb -0.002144162 0.0105356564 0.137919924
#> logsdf 0.026288158 -0.0002784340 -0.035159217
#> logsdi -0.000278434 0.0237200081 0.005885892
#> logsdc -0.035159217 0.0058858923 0.846808980
It is however easier to interpret the correlation rather than covariance. The correlation matrix can be calculated using
cov2cor(res$cov.fixed)
#> logm logK logq logn logsdb
#> logm 1.00000000 -0.207406260 0.62169321 -0.903884687 0.22798422
#> logK -0.20740626 1.000000000 -0.65412650 -0.048502121 -0.11423226
#> logq 0.62169321 -0.654126497 1.00000000 -0.620532523 0.13691406
#> logn -0.90388469 -0.048502121 -0.62053252 1.000000000 -0.15810189
#> logsdb 0.22798422 -0.114232258 0.13691406 -0.158101887 1.00000000
#> logsdf 0.06154308 -0.077083010 0.06674871 -0.033775472 -0.01335809
#> logsdi -0.03456275 -0.084978500 0.06120708 0.038285616 0.06909890
#> logsdc 0.01181835 0.007336409 -0.03018195 0.001788533 0.15139140
#> logsdf logsdi logsdc
#> logm 0.06154308 -0.03456275 0.011818346
#> logK -0.07708301 -0.08497850 0.007336409
#> logq 0.06674871 0.06120708 -0.030181947
#> logn -0.03377547 0.03828562 0.001788533
#> logsdb -0.01335809 0.06909890 0.151391400
#> logsdf 1.00000000 -0.01115025 -0.235649427
#> logsdi -0.01115025 1.00000000 0.041530023
#> logsdc -0.23564943 0.04153002 1.000000000
For this data most parameters are well separated, i.e. relatively low correlation, perhaps with the exception of logm
and logn
, which have a correlation of \(-0.9\). Note that logr
is absent from the covariance matrix. This is because the model is parameterised in terms of logm
, logK
, and logn
from which logr
can be derived. The estimate of logr
is reported using TMB’s sdreport()
function and can be extracted using get.par()
.
The covariance between random effects (biomass and fishing mortality) is not reported automatically, but can be obtained by setting inp$getJointPrecision
to TRUE
(this entails longer computation time and memory requirement).
The covariance between sdported values (i.e. the values reported in res$value
) are given in res$cov
. As this matrix is typically large, the function get.cov()
can be used to extract the covariance between two scalar quantities
cov2cor(get.cov(res, 'logBmsy', 'logFmsy'))
#> [,1] [,2]
#> [1,] 1.0000000 -0.9982507
#> [2,] -0.9982507 1.0000000
This reveals that for this data set the estimates of log Fmsy and log Bmsy are highly correlated. This is often the case and the reason why the model is reparameterised.
Retrospecitive plots are sometimes used to evaluate the robustness of the model fit to the introduction of new data, i.e. to check whether the fit changes substantially when new data becomes available. Such calculations and plotting thereof can be crudely performed using retro()
as shown here
rep <- fit.spict(pol$albacore)
rep <- retro(rep)
plotspict.retro(rep)
By default
retro
creates 5 scenarios with catch and index time series which are shortened by the 1 to 5 last observations. The number of scenarios and thus observations which are removed can be changed with the argument nretroyear
in the function retro
. The graphs show the different scenarios with different colors. For the albacore data, there is a high consistency between the scenarios except for the fishing mortalites of the second scenario (in red), which indicate a large increase in F.
The estimation can be done using more than one biomass index, for example when scientific surveys are performed more than once every year or when there are both commercial and survey CPUE time-series available. The following example emulates a situation where a long but noisy first quarter index series and a shorter and less noisy second quarter index series are available with different catchabilities
inp <- list(timeC=pol$albacore$timeC, obsC=pol$albacore$obsC)
inp$timeI <- list(pol$albacore$timeI, pol$albacore$timeI[10:23]+0.25)
inp$obsI <- list()
inp$obsI[[1]] <- pol$albacore$obsI * exp(rnorm(23, sd=0.1)) # Index 1
inp$obsI[[2]] <- 10*pol$albacore$obsI[10:23] # Index 2
res <- fit.spict(inp)
sumspict.parest(res)
#> estimate cilow ciupp log.est
#> alpha1 8.83972780 1.79186937 43.60852914 2.1792561
#> alpha2 4.11531162 0.79275190 21.36329111 1.4147146
#> beta 0.11882406 0.01768743 0.79825932 -2.1301113
#> r 0.22724791 0.11129671 0.46399947 -1.4817137
#> rc 1.14447872 0.18975522 6.90274317 0.1349493
#> rold 0.37693745 0.04622688 3.07357636 -0.9756760
#> m 24.52130218 17.95907722 33.48135614 3.1995422
#> K 198.26039690 138.07049663 284.68924165 5.2895813
#> q1 0.34889558 0.21709757 0.56070698 -1.0529826
#> q2 3.67515045 2.31852238 5.82557707 1.3015941
#> n 0.39712037 0.04057183 3.88704680 -0.9235158
#> sdb 0.01715719 0.00356478 0.08257703 -4.0653381
#> sdf 0.37148021 0.27072896 0.50972585 -0.9902597
#> sdi1 0.15166486 0.11306890 0.20343551 -1.8860821
#> sdi2 0.07060717 0.04696496 0.10615090 -2.6506236
#> sdc 0.04414079 0.00721437 0.27007341 -3.1203710
plotspict.biomass(res)
The model estimates seperate observation noises and finds that the first index (sdi1
) is more noisy than the second (sdi2
). It is furthermore estimated that the catchabilities are different by a factor 10 (q1
versus q2
). The biomass plot shows both indices, with circles indicating the first index and squares indicating the second index (the two series can also be distringuished by their colours).
It is possible to use effort data directly in the model instead of calculating commercial CPUE and inputting this as an index. It is beyond the scope of this vignette to discuss all problems associated with indices based on commercial CPUEs, however it is intuitively clear that using the same information twice (catch as catch and catch in catch/effort) induces a correlation, which the model does not account for. These problems are easily avoided by putting catch and effort seperately
inpeff <- list(timeC=pol$albacore$timeC, obsC=pol$albacore$obsC,
timeE=pol$albacore$timeC, obsE=pol$albacore$obsC/pol$albacore$obsI)
repeff <- fit.spict(inpeff)
sumspict.parest(repeff)
#> estimate cilow ciupp log.est
#> beta 0.07385347 0.01464722 0.37238017 -2.6056723
#> r 0.23822939 0.10656855 0.53255150 -1.4345212
#> rc 1.14399163 0.21452271 6.10059821 0.1345236
#> rold 0.40826819 0.03851360 4.32789743 -0.8958310
#> m 24.16376130 18.62117904 31.35608969 3.1848540
#> K 189.53177410 132.49527140 271.12132390 5.2445567
#> qf 0.41117179 0.25562626 0.66136492 -0.8887442
#> n 0.41648800 0.04180192 4.14962359 -0.8758976
#> sdb 0.01430671 0.00236752 0.08645421 -4.2470266
#> sdf 0.37625014 0.27938787 0.50669403 -0.9775011
#> sde 0.09820098 0.06985342 0.13805240 -2.3207391
#> sdc 0.02778738 0.00568262 0.13587724 -3.5831734
par(mfrow=c(2, 2))
plotspict.bbmsy(repeff)
plotspict.ffmsy(repeff, qlegend=FALSE)
plotspict.catch(repeff, qlegend=FALSE)
plotspict.fb(repeff)
Here the model runs without an index of biomass and instead uses effort as an index of fishing mortality Note that index observations are missing from the biomass plot, but effort observations are present in the plot of fishing mortality. Note also that q
is missing from the summary of parameter estimates and instead qf
is present, which is the commercial catchability.
Overall for this data set the results in terms of stock status etc. do not change much, and this will probably often be the case, however using effort data directly instead of commercial CPUE is cleaner and avoids inputting the same data twice.
It is not always appropriate to assume that the observation noise of a data series is constant in time. Knowledge that certain data points are more uncertain than others can be implemented using stdevfacC
, stdevfacI
, and stdevfacE
, which are vectors containing factors that are multiplied onto the standard deviation of the data points of the corresponding observation vectors. An example where the first 10 years of the biomass index are considered uncertain relative to the remaining time series and therefore are scaled by a factor 5.
inp <- pol$albacore
res1 <- fit.spict(inp)
inp$stdevfacC <- rep(1, length(inp$obsC))
inp$stdevfacC[1:10] <- 5
res2 <- fit.spict(inp)
par(mfrow=c(2, 1))
plotspict.catch(res1, main='No scaling')
plotspict.catch(res2, main='With scaling', qlegend=FALSE)
From the plot it is noted that the scaling factor widens the 95% CIs of the initial ten years of catch data, while narrowing the 95% CIs of the remaining years.
The package has built-in functionality for simulating data, which is useful for testing.
Data are simulated using an input list, e.g. inp
, containing parameter values specified in inp$ini
. To simulate data using default parameters run
inp <- check.inp(pol$albacore)
sim <- sim.spict(inp)
plotspict.data(sim)
This will generate catch and index data of same length as the input catch and index time series (here 23 of each) at the time points of the input data. Note when plotting simulated data, the true biomass and fishing mortality are also included in the plot.
Another simple example is
inp <- list(ini=list(logK=log(100), logm=log(10), logq=log(1)))
sim <- sim.spict(inp, nobs=50)
plotspict.data(sim)
Here the required parameters are specified (the rest use default values), and the number of observations is specified as an argument to sim.spict()
.
A more customised example including model fitting is
set.seed(31415926)
inp <- list(ini=list(logK=log(100), logm=log(10), logq=log(1),
logbkfrac=log(1), logF0=log(0.3), logsdc=log(0.1),
logsdf=log(0.3)))
sim <- sim.spict(inp, nobs=30)
res <- fit.spict(sim)
sumspict.parest(res)
#> estimate true cilow ciupp true.in.ci log.est
#> alpha 1.04607711 -9.0 0.31604357 3.4624255 -9 0.04504709
#> beta 0.13191757 -9.0 0.02269878 0.7666599 -9 -2.02557801
#> r 1.07672251 -9.0 0.35061486 3.3065666 -9 0.07392172
#> rc 0.63474118 -9.0 0.34468261 1.1688909 -9 -0.45453795
#> rold 0.45001540 -9.0 0.22252978 0.9100528 -9 -0.79847347
#> m 14.48076844 10.0 10.46872828 20.0303847 0 2.67282145
#> K 76.02606890 100.0 46.11318792 125.3429531 1 4.33107629
#> q 1.19617687 1.0 0.85013702 1.6830688 1 0.17913053
#> n 3.39263481 2.0 1.36225519 8.4492032 1 1.22160685
#> sdb 0.19525936 0.2 0.08857955 0.4304178 1 -1.63342656
#> sdf 0.34363008 0.3 0.25198736 0.4686014 1 -1.06818954
#> sdi 0.20425635 0.2 0.12166919 0.3429024 1 -1.58837948
#> sdc 0.04533085 0.1 0.00814469 0.2522975 1 -3.09376755
par(mfrow=c(2, 2))
plotspict.biomass(res)
plotspict.f(res, qlegend=FALSE)
plotspict.catch(res, qlegend=FALSE)
plotspict.fb(res)
Here the ratio between biomass in the initial year relative to K
is set using logbkfrac
, the initial fishing mortality is set using logF0
, process noise of F
is set using logsdf
, and finally observation noise on catches is specified using logsdc
.
When printing the summary of the parameter estimates the true values are included as well as a check whether the true value was inside the 95% CIs. Similarly, the true biomass, fishing mortality, and reference points are included in the results plot using a yellow/orange colour.
It is possible to simulate seasonal data (most often quarterly). Additional variables must be specified in the input list that define the type of seasonality to be used. Spline based seasonality is shown first (inp$seasontype = 1
). This is the default and therefore does not need to be explicitly specified. It is required that number of seasons is specified using nseasons
(4 indicates quarterly), the order of the spline must be specified using splineorder
(3 for quarterly data), time vectors for catch and index containing subannual time points must be specified, and finally the spline parameters (logphi
) must be set. With four seasons logphi
must be a vector of length 3, where each value in the vector gives the log fishing intensity relative to level in season four, which is log(1)
. An example of simulating seasonal data using a spline is
set.seed(1234)
inp <- list(nseasons=4, splineorder=3)
inp$timeC <- seq(0, 30-1/inp$nseasons, by=1/inp$nseasons)
inp$timeI <- seq(0, 30-1/inp$nseasons, by=1/inp$nseasons)
inp$ini <- list(logK=log(100), logm=log(20), logq=log(1),
logbkfrac=log(1), logsdf=log(0.4), logF0=log(0.5),
logphi=log(c(0.05, 0.1, 1.8)))
seasonsim <- sim.spict(inp)
plotspict.data(seasonsim)
The data plot shows clear seasonality in the catches. To simulate seasonal data using the coupled SDE approach seasontype
must be set to 2 and nseasons
to 4.
set.seed(432)
inp <- list(nseasons=4, seasontype=2)
inp$timeC <- seq(0, 30-1/inp$nseasons, by=1/inp$nseasons)
inp$timeI <- seq(0, 30-1/inp$nseasons, by=1/inp$nseasons)
inp$ini <- list(logK=log(100), logm=log(20), logq=log(1),
logbkfrac=log(1), logsdf=log(0.4), logF0=log(0.5))
seasonsim2 <- sim.spict(inp)
plotspict.data(seasonsim2)
Catch information available in sub-annual aggregations, e.g. quarterly catch, can be used to estimate the seasonal pattern of the fishing mortality. The user can choose between two types of seasonality by setting seasontype
to 1 or 2:
Technical description of the season types is found in Pedersen and Berg (2016).
Here, an example of a spline-based model fitted to quarterly data simulated in section is shown
seasonres <- fit.spict(seasonsim)
plotspict.biomass(seasonres)
plotspict.f(seasonres, qlegend=FALSE)
plotspict.season(seasonres)
The model is able to estimate the seasonal variation in fishing mortality as seen both in the plot of F
and in the plot of the estimated spline, where blue is the estimated spline, orange is the true spline, and green is the spline if time were truly continuous (it is discretised with the Euler steps shown by the blue line).
To fit the coupled SDE model run
seasonres2 <- fit.spict(seasonsim2)
sumspict.parest(seasonres2)
#> estimate true cilow ciupp true.in.ci log.est
#> alpha 1.38654513 -9.0 0.74646650 2.5754772 -9 0.3268151
#> beta 0.69812209 -9.0 0.46862508 1.0400093 -9 -0.3593613
#> r 0.75461230 -9.0 0.19653877 2.8973404 -9 -0.2815512
#> rc 0.57786149 -9.0 0.38165597 0.8749343 -9 -0.5484211
#> rold 0.46819698 -9.0 0.22851759 0.9592627 -9 -0.7588662
#> m 20.30400508 20.0 15.39627326 26.7761305 1 3.0108182
#> K 127.48850154 100.0 77.62144021 209.3921213 1 4.8480262
#> q 0.75175503 1.0 0.51602963 1.0951612 1 -0.2853448
#> n 2.61174108 2.0 0.81672644 8.3518681 1 0.9600171
#> sdb 0.13137017 0.2 0.07696246 0.2242408 1 -2.0297362
#> sdu 0.10517086 0.1 0.05695791 0.1941944 1 -2.2521690
#> sdf 0.31639458 0.4 0.23247974 0.4305989 1 -1.1507652
#> sdi 0.18215067 0.2 0.15479277 0.2143438 1 -1.7029211
#> sdc 0.22088205 0.2 0.18154286 0.2687458 1 -1.5101265
#> lambda 0.06185887 0.1 0.00855678 0.4471918 1 -2.7828998
plotspict.biomass(seasonres2)
plotspict.f(seasonres2, qlegend=FALSE)
Two parameters related to the coupled SDEs are estimated (sdu
and lambda
) as evident from the summary of estimated parameters. In the plot of fishing mortality it is noted that the amplitude of the seasonal pattern varies over time. This is a property of the coupled SDE model, which is not possible to obtain with the spline based seasonal model. The spline based model has a fixed amplitude and phases, which will lead to biased estimates and autocorrelation in residuals if in reality the seasonal pattern shifts a bit. This is illustrated by fitting a spline based model to data generated with a coupled SDE model
inp2 <- list(obsC=seasonsim2$obsC, obsI=seasonsim2$obsI,
timeC=seasonsim2$timeC, timeI=seasonsim2$timeI,
seasontype=1, true=seasonsim2$true)
rep2 <- fit.spict(inp2)
rep2 <- calc.osa.resid(rep2)
plotspict.diagnostic(rep2)
From the diagnostics it is clear that autocorrelation is present in the catch residuals.
Initial parameter values used as starting guess of the optimiser can be set using inp$ini
. For example, to specify the initial value of logK
set
inp <- pol$albacore
inp$ini$logK <- log(100)
This procedure generalises to all other model parameters. If initial values are not specified they are set to default values. To see the default initial value of a parameter, here logK
, run
inp <- check.inp(pol$albacore)
inp$ini$logK
#> [1] 5.010635
This can also be done posterior to fitting the model by printing res$inp$ini$logK
.
It is prudent to check that the same parameter estimates are obtained if using different initial values. If the optimum of the objective function is poorly defined, i.e. possibly containing multiple optima, it is possible that different parameter estimates will be returned depending on the initial values. To check whether this is the case run
set.seed(123)
check.ini(pol$albacore, ntrials=4)
#> Checking sensitivity of fit to initial parameter values...
#> Trial 1 ... model fitted!
#> Trial 2 ... model fitted!
#> Trial 3 ... model fitted!
#> Trial 4 ... model fitted!
#> $propchng
#> logm logK logq logn logsdb logsdf logsdi logsdc
#> Trial 1 -1.41 0.26 -0.12 -2.75 -1.26 1.30 -0.08 -1.12
#> Trial 2 0.34 -0.04 0.62 0.34 -0.51 -0.21 1.14 -1.14
#> Trial 3 -1.69 -0.42 -0.23 -3.26 -1.11 -0.55 -0.40 -1.41
#> Trial 4 1.03 0.19 0.06 -0.68 0.60 1.01 -1.32 -1.15
#>
#> $inimat
#> Distance logn logK logm logq logsdb logsdf logsdi logsdc
#> Basevec 0.00 0.69 5.01 3.41 -0.64 -1.61 -1.61 -1.61 -1.61
#> Trial 1 4.22 -0.29 6.34 2.99 1.12 0.42 -3.70 -1.48 0.20
#> Trial 2 3.48 0.93 4.81 5.51 -0.86 -0.79 -1.27 -3.44 0.23
#> Trial 3 4.52 -0.48 2.90 2.61 1.45 0.18 -0.72 -0.96 0.67
#> Trial 4 3.64 1.41 5.97 3.61 -0.21 -2.58 -3.23 0.52 0.24
#>
#> $resmat
#> Distance m K q n sdb sdf sdi sdc
#> Basevec 0 22.58 201.48 0.35 0.69 0.01 0.37 0.11 0.04
#> Trial 1 0 22.58 201.48 0.35 0.69 0.01 0.37 0.11 0.04
#> Trial 2 0 22.58 201.48 0.35 0.69 0.01 0.37 0.11 0.04
#> Trial 3 0 22.58 201.48 0.35 0.69 0.01 0.37 0.11 0.04
#> Trial 4 0 22.58 201.48 0.35 0.69 0.01 0.37 0.11 0.04
#> $obsC
#> [1] 15.9 25.7 28.5 23.7 25.0 33.3 28.2 19.7 17.5 19.3 21.6 23.1 22.5 22.5
#> [15] 23.6 29.1 14.4 13.2 28.4 34.6 37.5 25.9 25.3
#>
#> $timeC
#> [1] 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980
#> [15] 1981 1982 1983 1984 1985 1986 1987 1988 1989
#>
#> $obsI
#> [1] 61.89 78.98 55.59 44.61 56.89 38.27 33.84 36.13 41.95 36.63 36.33
#> [12] 38.82 34.32 37.64 34.01 32.16 26.88 36.61 30.07 30.75 23.36 22.36
#> [23] 21.91
#>
#> $timeI
#> [1] 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980
#> [15] 1981 1982 1983 1984 1985 1986 1987 1988 1989
#>
#> $check.ini
#> $check.ini$propchng
#> logm logK logq logn logsdb logsdf logsdi logsdc
#> Trial 1 -1.41 0.26 -0.12 -2.75 -1.26 1.30 -0.08 -1.12
#> Trial 2 0.34 -0.04 0.62 0.34 -0.51 -0.21 1.14 -1.14
#> Trial 3 -1.69 -0.42 -0.23 -3.26 -1.11 -0.55 -0.40 -1.41
#> Trial 4 1.03 0.19 0.06 -0.68 0.60 1.01 -1.32 -1.15
#>
#> $check.ini$inimat
#> Distance logn logK logm logq logsdb logsdf logsdi logsdc
#> Basevec 0.00 0.69 5.01 3.41 -0.64 -1.61 -1.61 -1.61 -1.61
#> Trial 1 4.22 -0.29 6.34 2.99 1.12 0.42 -3.70 -1.48 0.20
#> Trial 2 3.48 0.93 4.81 5.51 -0.86 -0.79 -1.27 -3.44 0.23
#> Trial 3 4.52 -0.48 2.90 2.61 1.45 0.18 -0.72 -0.96 0.67
#> Trial 4 3.64 1.41 5.97 3.61 -0.21 -2.58 -3.23 0.52 0.24
#>
#> $check.ini$resmat
#> Distance m K q n sdb sdf sdi sdc
#> Basevec 0 22.58 201.48 0.35 0.69 0.01 0.37 0.11 0.04
#> Trial 1 0 22.58 201.48 0.35 0.69 0.01 0.37 0.11 0.04
#> Trial 2 0 22.58 201.48 0.35 0.69 0.01 0.37 0.11 0.04
#> Trial 3 0 22.58 201.48 0.35 0.69 0.01 0.37 0.11 0.04
#> Trial 4 0 22.58 201.48 0.35 0.69 0.01 0.37 0.11 0.04
The argument ntrials
set the number of different initial values to test for. To keep it simple only few trials are generated here, however for real data cases more should be used, say 30. The propchng
contains the proportional change of the new randomly generated initial value relative to the base initial value, inimat
contains the new randomly generated initial values, and resmat
contains the resulting parameter estimates and a distance from the estimated parameter vector to the base parameter vector. The distance should preferably be close to zero. If that is not the case further investigation is required, i.e. inspection of objective function values, differences in results and residual diagnostics etc. should be performed. The example shown here looks fine in that all converged runs return the same parameter estimates. One trial did not converge, however non-converging trials are to some extent expected as the initial parameters are generated independently from a wide uniform distribution and may thus by chance be very inappropriately chosen.
The package has the ability to estimate parameters in phases. Users familiar with AD model builder will know that this means that some parameters are held constant in phase 1, some are then released and estimated in phase 2, more are released in phase 3 etc. until all parameters are estimated. Per default all parameters are estimated in phase 1. As an example the standard deviation on the biomass process, logsdb
, is estimated in phase 2:
inp <- pol$albacore
inp$phases$logsdb <- 2
res <- fit.spict(inp)
#> Estimating - phase 1
#> Estimating - phase 2
Phases can also be used to fix parameters to their initial value by setting the phase to -1
. For example
inp <- pol$albacore
inp$phases$logsdb <- -1
inp$ini$logsdb <- log(0.1)
res <- fit.spict(inp)
summary(res)
#> Convergence: 0 MSG: relative convergence (4)
#> Objective function at optimum: 5.8647428
#> Euler time step (years): 1/16 or 0.0625
#> Nobs C: 23, Nobs I1: 23
#>
#> Priors
#> logn ~ dnorm[log(2), 2^2]
#> logalpha ~ dnorm[log(1), 2^2]
#> logbeta ~ dnorm[log(1), 2^2]
#>
#> Fixed parameters
#> fixed.value
#> sdb 0.1
#> phi NA
#>
#> Model parameter estimates w 95% CI
#> estimate cilow ciupp log.est
#> alpha 1.0503613 0.6791518 1.6244657 0.0491342
#> beta 0.1713918 0.0256773 1.1440113 -1.7638034
#> r 0.3471988 0.1465244 0.8227093 -1.0578579
#> rc 1.7791121 0.2676089 11.8278559 0.5761144
#> rold 0.5694636 0.0689479 4.7033865 -0.5630604
#> m 27.4846402 18.9163563 39.9339827 3.3136273
#> K 144.5708243 82.7912265 252.4509434 4.9737695
#> q 0.4966500 0.2541097 0.9706881 -0.6998696
#> n 0.3903056 0.0392947 3.8768218 -0.9408251
#> sdf 0.3738951 0.2594015 0.5389235 -0.9837801
#> sdi 0.1050361 0.0679152 0.1624466 -2.2534509
#> sdc 0.0640825 0.0113802 0.3608516 -2.7475835
#>
#> Deterministic reference points (Drp)
#> estimate cilow ciupp log.est
#> Bmsyd 30.8970294 6.2838311 151.917901 3.4306600
#> Fmsyd 0.8895561 0.1338045 5.913928 -0.1170327
#> MSYd 27.4846402 18.9163563 39.933983 3.3136273
#> Stochastic reference points (Srp)
#> estimate cilow ciupp log.est rel.diff.Drp
#> Bmsys 30.8170212 6.3481131 149.601746 3.428067 -0.0025962349
#> Fmsys 0.8901022 0.1349377 5.871466 -0.116419 0.0006135542
#> MSYs 27.4303408 18.8037385 40.014575 3.311650 -0.0019795378
#>
#> States w 95% CI (inp$msytype: s)
#> estimate cilow ciupp log.est
#> B_1989.00 43.5729051 21.6014377 87.892208 3.7744355
#> F_1989.00 0.5677372 0.2613795 1.233170 -0.5660967
#> B_1989.00/Bmsy 1.4139233 0.3592721 5.564527 0.3463683
#> F_1989.00/Fmsy 0.6378337 0.1112640 3.656454 -0.4496777
#>
#> Predictions w 95% CI (inp$msytype: s)
#> prediction cilow ciupp log.est
#> B_1990.00 44.7705544 21.9026960 91.513965 3.8015507
#> F_1990.00 0.5737431 0.2522597 1.304930 -0.5555735
#> B_1990.00/Bmsy 1.4527866 0.3371972 6.259213 0.3734835
#> F_1990.00/Fmsy 0.6445812 0.1037380 4.005138 -0.4391545
#> Catch_1990.00 25.8407642 15.9509270 41.862463 3.2519533
#> E(B_inf) 45.9909037 NA NA 3.8284436
SPiCT is a generalisation of previous surplus production models in the sense that stochastic noise is included in both observation and state processes of both fishing and biomass. Estimating all model parameters is only possible if data contain sufficient information, which may not be the case for short time series or time series with limited contrast. The basic data requirements of the model are limited to only catch and biomass index time series. More information may be available, which can be used to improve the model fit. This is particularly advantageous if the model is not able to converge with only catch and index time series. Additional information can then be included in the fit via prior distributions for model parameters.
Quantities that are traditionally difficult to estimate are logn
, and the noise ratios logalpha
and logbeta
where logalpha = logsdi - logsdb
and logbeta = logsdc - logsdf
, respectively. Therefore, to generally stabilise estimation default semi-informative priors are imposed on these quantities that inhibit them from taking extreme and unrealistic values. If informative data are available these priors should have limited effect on results, if informative data are not available estimates will reduce to the priors.
If informative data are available and the default priors therefore are unwanted they can be disabled using
inp <- pol$albacore
inp$priors$logn <- c(1, 1, 0)
inp$priors$logalpha <- c(1, 1, 0)
inp$priors$logbeta <- c(1, 1, 0)
fit.spict(inp)
#> Convergence: 0 MSG: relative convergence (4)
#> Objective function at optimum: 5.0598288
#> Euler time step (years): 1/16 or 0.0625
#> Nobs C: 23, Nobs I1: 23
#>
#> Fixed parameters
#> fixed.value
#> phi NA
#>
#> Model parameter estimates w 95% CI
#> estimate cilow ciupp log.est
#> alpha 39.0512850 0.0402366 3.790090e+04 3.6648758
#> beta 0.0245071 0.0000265 2.269299e+01 -3.7087912
#> r 0.1955750 0.0313372 1.220581e+00 -1.6318113
#> rc 1.2604800 0.0097765 1.625128e+02 0.2314926
#> rold 0.2835728 0.0024028 3.346681e+01 -1.2602862
#> m 24.3112479 12.0981964 4.885330e+01 3.1909391
#> K 210.4516033 123.9786484 3.572379e+02 5.3492557
#> q 0.3842438 0.2070701 7.130113e-01 -0.9564779
#> n 0.3103183 0.0004170 2.309134e+02 -1.1701567
#> sdb 0.0028040 0.0000029 2.728494e+00 -5.8767196
#> sdf 0.3809116 0.2827808 5.130959e-01 -0.9651879
#> sdi 0.1094986 0.0812449 1.475777e-01 -2.2118438
#> sdc 0.0093351 0.0000103 8.475656e+00 -4.6739791
#>
#> Deterministic reference points (Drp)
#> estimate cilow ciupp log.est
#> Bmsyd 38.57459 0.5969776 2492.55406 3.6525937
#> Fmsyd 0.63024 0.0048883 81.25641 -0.4616546
#> MSYd 24.31125 12.0981964 48.85330 3.1909391
#> Stochastic reference points (Srp)
#> estimate cilow ciupp log.est rel.diff.Drp
#> Bmsys 38.5744682 0.5969872 2492.49852 3.6525906 -3.107785e-06
#> Fmsys 0.6302411 0.0048883 81.25592 -0.4616529 1.767276e-06
#> MSYs 24.3112135 12.0980447 48.85377 3.1909377 -1.413322e-06
#>
#> States w 95% CI (inp$msytype: s)
#> estimate cilow ciupp log.est
#> B_1989.00 54.0126818 28.3907717 102.7576787 3.9892189
#> F_1989.00 0.4528693 0.2235663 0.9173593 -0.7921517
#> B_1989.00/Bmsy 1.4002184 0.0314790 62.2830821 0.3366283
#> F_1989.00/Fmsy 0.7185652 0.0079263 65.1417065 -0.3304988
#>
#> Predictions w 95% CI (inp$msytype: s)
#> prediction cilow ciupp log.est
#> B_1990.00 52.5650582 29.3174862 94.2470078 3.9620516
#> F_1990.00 0.4858020 0.2370661 0.9955179 -0.7219542
#> B_1990.00/Bmsy 1.3626904 0.0266106 69.7815296 0.3094610
#> F_1990.00/Fmsy 0.7708193 0.0076917 77.2473913 -0.2603013
#> Catch_1990.00 25.2158689 15.5407631 40.9143385 3.2274735
#> E(B_inf) 49.5039259 NA NA 3.9020520
The model is able to converge without priors, however the estimates of alpha
, beta
and n
are very uncertain indicating that limited information is available about these parameters.
The model parameters to which priors can be applied can be listed using
list.possible.priors()
#> [1] "logn" "logalpha" "logbeta" "logr" "logK"
#> [6] "logm" "logq" "iqgamma" "logqf" "logbkfrac"
#> [11] "logB" "logF" "logBBmsy" "logFFmsy" "logsdb"
#> [16] "isdb2gamma" "logsdf" "isdf2gamma" "logsdi" "isdi2gamma"
#> [21] "logsde" "isde2gamma" "logsdc" "isdc2gamma" "logsdm"
#> [26] "logpsi" "mu"
A prior is set using
inp <- pol$albacore
inp$priors$logK <- c(log(300), 2, 1)
fit.spict(inp)
#> Convergence: 0 MSG: relative convergence (4)
#> Objective function at optimum: 3.697211
#> Euler time step (years): 1/16 or 0.0625
#> Nobs C: 23, Nobs I1: 23
#>
#> Priors
#> logK ~ dnorm[log(300), 2^2]
#> logn ~ dnorm[log(2), 2^2]
#> logalpha ~ dnorm[log(1), 2^2]
#> logbeta ~ dnorm[log(1), 2^2]
#>
#> Fixed parameters
#> fixed.value
#> phi NA
#>
#> Model parameter estimates w 95% CI
#> estimate cilow ciupp log.est
#> alpha 8.5541219 1.2276016 59.6064715 2.1464133
#> beta 0.1213066 0.0180837 0.8137331 -2.1094342
#> r 0.2543931 0.0999822 0.6472737 -1.3688747
#> rc 0.7408820 0.1423248 3.8567138 -0.2999139
#> rold 0.8120577 0.0018384 358.7037356 -0.2081839
#> m 22.5701177 17.0283130 29.9154832 3.1166268
#> K 202.2160641 138.6995451 294.8195435 5.3093367
#> q 0.3499366 0.1930339 0.6343737 -1.0500034
#> n 0.6867303 0.0623502 7.5637086 -0.3758136
#> sdb 0.0127865 0.0018399 0.0888615 -4.3593656
#> sdf 0.3672885 0.2672937 0.5046914 -1.0016078
#> sdi 0.1093772 0.0808912 0.1478947 -2.2129524
#> sdc 0.0445545 0.0073418 0.2703825 -3.1110420
#>
#> Deterministic reference points (Drp)
#> estimate cilow ciupp log.est
#> Bmsyd 60.927699 15.2912403 242.765426 4.1096879
#> Fmsyd 0.370441 0.0711624 1.928357 -0.9930611
#> MSYd 22.570118 17.0283130 29.915483 3.1166268
#> Stochastic reference points (Srp)
#> estimate cilow ciupp log.est rel.diff.Drp
#> Bmsys 60.9200355 15.2914370 242.701240 4.1095621 -1.257925e-04
#> Fmsys 0.3704531 0.0711556 1.928668 -0.9930283 3.266226e-05
#> MSYs 22.5680187 17.0227429 29.919706 3.1165338 -9.300659e-05
#>
#> States w 95% CI (inp$msytype: s)
#> estimate cilow ciupp log.est
#> B_1989.00 59.4317253 31.0677432 113.6912308 4.0848282
#> F_1989.00 0.4143985 0.2035123 0.8438121 -0.8809271
#> B_1989.00/Bmsy 0.9755694 0.3412397 2.7890532 -0.0247339
#> F_1989.00/Fmsy 1.1186260 0.2875234 4.3520786 0.1121012
#>
#> Predictions w 95% CI (inp$msytype: s)
#> prediction cilow ciupp log.est
#> B_1990.00 56.7523819 30.1059283 106.9833428 4.0386976
#> F_1990.00 0.4446518 0.2086071 0.9477876 -0.8104637
#> B_1990.00/Bmsy 0.9315881 0.2917620 2.9745352 -0.0708645
#> F_1990.00/Fmsy 1.2002917 0.2809798 5.1274160 0.1825646
#> Catch_1990.00 24.7355116 15.3286450 39.9151741 3.2082399
#> E(B_inf) 50.1634075 NA NA 3.9152858
This imposes a Gaussian prior on logK
with mean \(\log(300)\) and standard deviation 2. The third entry indicates that the prior is used (1 means use, 0 means do not use). From the summary it is evident that the default priors were also imposed.
Priors can be applied to random effects of the model, i.e. logB
, logF
, logBBmsy
, (which is \(\log(B/Bmsy)\)) logFFmsy
(which is \(\log(F/Fmsy)\)). An additional argument is required to specify these priors
inp <- pol$albacore
inp$priors$logB <- c(log(80), 0.1, 1, 1980)
par(mfrow=c(1, 2), mar=c(5, 4.1, 3, 4))
plotspict.biomass(fit.spict(pol$albacore), ylim=c(0, 500))
plotspict.biomass(fit.spict(inp), qlegend=FALSE, ylim=c(0, 500))
This imposes a Gaussian prior on logB
with mean \(\log(80)\), standard deviation 0.1 (very informative), the third entry in the vector indicates that the prior is used, the fourth entry indicates the year to which the prior should be applied, here 1980.
It is clear from the plots that the prior influences the results significantly. Furthermore, it is not only the biomass in the year 1980 that is affected, but the information propagates forward and backward because all estimates are correlated. In reality such an informative prior is rarely available, however it may be possible to derive information about the absolute biomass from acoustic survey and swept area estimates. It is, however, critical that the standard deviation used reflects the quality of the information.
Model parameters can be fixed using phases
as described previously. This technique can, however, only be used to fix model parameters and therefore not derived quantities such as logalpha
, logr
(which is derived from logK
, logm
and logn
). Fixing a parameter can be regarded as imposing an highly informative prior to the parameter
inp <- pol$albacore
inp$priors$logn <- c(log(2), 1e-3)
inp$priors$logalpha <- c(log(1), 1e-3)
inp$priors$logbeta <- c(log(1), 1e-3)
fit.spict(inp)
#> Convergence: 0 MSG: relative convergence (4)
#> Objective function at optimum: -13.3777183
#> Euler time step (years): 1/16 or 0.0625
#> Nobs C: 23, Nobs I1: 23
#>
#> Priors
#> logn ~ dnorm[log(2), 0.001^2] (fixed)
#> logalpha ~ dnorm[log(1), 0.001^2] (fixed)
#> logbeta ~ dnorm[log(1), 0.001^2] (fixed)
#>
#> Fixed parameters
#> fixed.value
#> phi NA
#>
#> Model parameter estimates w 95% CI
#> estimate cilow ciupp log.est
#> alpha 1.0000039 0.9980458 1.0019658 0.0000039
#> beta 0.9999976 0.9980395 1.0019595 -0.0000024
#> r 0.5046213 0.1875581 1.3576733 -0.6839471
#> rc 0.5046222 0.1875581 1.3576783 -0.6839453
#> rold 0.5046231 0.1875573 1.3576886 -0.6839435
#> m 22.0086909 17.0613914 28.3905610 3.0914374
#> K 174.4569110 74.7554134 407.1305663 5.1616778
#> q 0.3549421 0.1278982 0.9850325 -1.0358005
#> n 1.9999964 1.9960802 2.0039202 0.6931454
#> sdb 0.0966006 0.0704056 0.1325417 -2.3371705
#> sdf 0.2102260 0.1562925 0.2827708 -1.5595721
#> sdi 0.0966010 0.0704060 0.1325419 -2.3371666
#> sdc 0.2102255 0.1562923 0.2827699 -1.5595745
#>
#> Deterministic reference points (Drp)
#> estimate cilow ciupp log.est
#> Bmsyd 87.2283941 37.377636 203.5653831 4.468530
#> Fmsyd 0.2523111 0.093779 0.6788392 -1.377093
#> MSYd 22.0086909 17.061391 28.3905610 3.091437
#> Stochastic reference points (Srp)
#> estimate cilow ciupp log.est rel.diff.Drp
#> Bmsys 86.1721785 37.1707121 199.771377 4.456347 -0.012257037
#> Fmsys 0.2500268 0.0921861 0.678122 -1.386187 -0.009136167
#> MSYs 21.5429413 16.5473161 28.046743 3.070048 -0.021619591
#>
#> States w 95% CI (inp$msytype: s)
#> estimate cilow ciupp log.est
#> B_1989.00 59.8081015 20.7927035 172.031934 4.0911411
#> F_1989.00 0.4264686 0.1528595 1.189821 -0.8522164
#> B_1989.00/Bmsy 0.6940535 0.4759258 1.012154 -0.3652062
#> F_1989.00/Fmsy 1.7056917 1.0390604 2.800015 0.5339707
#>
#> Predictions w 95% CI (inp$msytype: s)
#> prediction cilow ciupp log.est
#> B_1990.00 54.5156897 17.2286845 172.500716 3.9984885
#> F_1990.00 0.4313747 0.1427056 1.303973 -0.8407781
#> B_1990.00/Bmsy 0.6326368 0.3741020 1.069840 -0.4578588
#> F_1990.00/Fmsy 1.7253140 0.9361420 3.179762 0.5454091
#> Catch_1990.00 22.6023846 14.8441308 34.415474 3.1180554
#> E(B_inf) 20.7049018 NA NA 3.0303705
The summary indicates that the priors are so informative that the quantities are essentially fixed. It is also noted that the estimates of these quantities are very close to the mean of their respective priors.
Particular caution is required when fixing a parameter that is highly correlated with other parameters because this will to some extent restrict the estimates of the correlated parameters. This could also be a problem when specifying priors depending on the amount of a priori information available.
The presence of extreme observations may inflate estimates of observation noise and increase the general uncertainty of the fit. To reduce this effect it is possible to apply a robust estimation scheme, which is less sensitive to extreme observations. An example with an extreme observation in the catch series is
inp <- pol$albacore
inp$obsC[10] <- 3*inp$obsC[10]
res1 <- fit.spict(inp)
inp$robflagc <- 1
res2 <- fit.spict(inp)
sumspict.parest(res2)
#> estimate cilow ciupp log.est
#> alpha 8.01383375 1.14064160 56.30298903 2.0811693
#> beta 0.13903414 0.02002425 0.96535420 -1.9730357
#> r 0.25685420 0.10125070 0.65159133 -1.3592467
#> rc 0.73334989 0.13978955 3.84722638 -0.3101324
#> rold 0.85759756 0.00140610 523.05883239 -0.1536203
#> m 22.57344335 16.94359772 30.07391660 3.1167741
#> K 202.06196157 137.06067492 297.89023247 5.3085744
#> q 0.34766878 0.18846113 0.64137140 -1.0565050
#> n 0.70049565 0.06408927 7.65641660 -0.3559671
#> sdb 0.01371149 0.00196485 0.09568409 -4.2895209
#> sdf 0.37086365 0.26568565 0.51767886 -0.9919208
#> sdi 0.10988163 0.08106901 0.14893450 -2.2083516
#> sdc 0.05156271 0.00843494 0.31520243 -2.9649566
#> pp 0.95304961 0.72886830 0.99351826 3.0105755
#> robfac 20.83563743 2.67330917 236.13437947 2.9874802
par(mfrow=c(1, 2))
plotspict.catch(res1, main='Regular fit')
plotspict.catch(res2, qlegend=FALSE, main='Robust fit')
It is evident from the plot that the presence of the extreme catch observation generally inflates the uncertainty of the estimated catches, while the robust fit is less sensitive. Robust estimation can be applied to index and effort data using robflagi
and robflage
respectively.
Robust estimation is implemented using a mixture of light-tailed and a heavy-tailed Gaussian distribution as described in Pedersen and Berg (2016). This entails two additional parameters (pp
and robfac
) that require estimation. This may not always be possible given the increased model complexity. In such cases these parameters should be fixed by setting their phases to -1
.
To make a catch forecast a forecast interval needs to be specified. This is done by specifying the start of the interval (inp$timepredc
) and the length of the interval in years (inp$dtpredc
). For example, if a forecast of the annual catch of 2018 is of interest, then inp$timepredc = 2018
and inp$dtpredc = 1
. In addition to the forecast interval a fishing scenario needs to be specified. This is done by specifying a factor (inp$ffac
) to multiply the current fishing mortality by (i.e. the F at the last time point of the time period where data are available) and the time that management should start (inp$manstart
). The time point of the reported forecast of biomass and fishing mortality can be controlled by setting inp$timepredi
. Producing short-term forecasts entails minimal additional computing time.
Forecasts are produced as part of the usual model fitting. To illustrate the procedure, a short example using the South Atlantic albacore dataset of Polacheck, Hilborn, and Punt (1993) containing catch and commercial CPUE data in the interval 1967 to 1989 is presented. The code to obtain the forecasted annual catch in the interval starting 1991 under a management scenario where the fishing pressure is reduced by 25% starting in 1991, and a forecasted index in 1992 is:
library(spict)
data(pol)
inp <- pol$albacore
inp$manstart <- 1991
inp$timepredc <- 1991
inp$dtpredc <- 1
inp$timepredi <- 1992
inp$ffac <- 0.75
res <- fit.spict(inp)
To specifically show forecast results use
sumspict.predictions(res)
#> prediction cilow ciupp log.est
#> B_1992.00 58.0404387 28.74512737 117.191776 4.06113999
#> F_1992.00 0.3348379 0.09426505 1.189374 -1.09410874
#> B_1992.00/Bmsy 0.9556116 0.23937468 3.814912 -0.04540377
#> F_1992.00/Fmsy 0.9006312 0.15380645 5.273748 -0.10465949
#> Catch_1991.00 18.8028897 9.48829836 37.261545 2.93401056
#> E(B_inf) 67.1854881 NA NA 4.20745727
This output is also shown when using summary(res)
. The results can be plotted using plot(res)
, however to visualise the change in forecasted fishing mortality and associated change in forecasted catch more clearly we use
par(mfrow=c(2, 2), mar=c(4, 4.5, 3, 3.5))
plotspict.bbmsy(res)
plotspict.ffmsy(res, qlegend=FALSE)
plotspict.catch(res, qlegend=FALSE)
plotspict.fb(res, man.legend=FALSE)
Note in the plot that the decrease in fishing pressure results in a constant biomass as opposed to the expected decrease if fishing effort had remained constant.
The package has a function that runs several predefined management scenarios, which can be presented in a forecast table. To perform the calculations required to produce the forecast table run:
res <- manage(res)
where res
is the result of fit.spict()
from the code above. Then, the results can be summarised (and extracted) by running:
df <- mansummary(res)
#> Observed interval, index: 1967.00 - 1989.00
#> Observed interval, catch: 1967.00 - 1990.00
#>
#> Fishing mortality (F) prediction: 1992.00
#> Biomass (B) prediction: 1992.00
#> Catch (C) prediction interval: 1991.00 - 1992.00
#>
#> Predictions
#> C B F B/Bmsy F/Fmsy perc.dB perc.dF
#> 1. Keep current catch 25.3 47.7 0.545 0.785 1.465 -12.2 22.0
#> 2. Keep current F 23.9 52.9 0.446 0.870 1.201 -2.7 0.0
#> 3. Fish at Fmsy 20.6 56.3 0.372 0.926 1.000 3.6 -16.7
#> 4. No fishing 0.0 77.0 0.000 1.267 0.001 41.8 -99.9
#> 5. Reduce F 25% 18.8 58.0 0.335 0.955 0.901 6.9 -25.0
#> 6. Increase F 25% 28.5 48.1 0.558 0.793 1.501 -11.3 25.0
#>
#> 95% CIs of absolute predictions
#> C.lo C.hi B.lo B.hi F.lo F.hi
#> 1. Keep current catch 25.3 25.3 22.9 99.5 0.244 1.215
#> 2. Keep current F 12.5 45.6 24.0 116.4 0.126 1.586
#> 3. Fish at Fmsy 10.5 40.2 27.1 116.9 0.105 1.321
#> 4. No fishing 0.0 0.1 48.5 122.3 0.000 0.002
#> 5. Reduce F 25% 9.5 37.3 28.7 117.2 0.094 1.189
#> 6. Increase F 25% 15.5 52.5 20.0 115.9 0.157 1.982
#>
#> 95% CIs of relative predictions
#> B/Bmsy.lo B/Bmsy.hi F/Fmsy.lo F/Fmsy.hi
#> 1. Keep current catch 0.234 2.640 0.356 6.019
#> 2. Keep current F 0.211 3.595 0.205 7.032
#> 3. Fish at Fmsy 0.230 3.737 0.171 5.856
#> 4. No fishing 0.334 4.802 0.000 0.007
#> 5. Reduce F 25% 0.239 3.815 0.154 5.274
#> 6. Increase F 25% 0.184 3.413 0.256 8.790
Then, df
is a data frame with each line containing a line of the output
head(df)
#> C B F B/Bmsy F/Fmsy perc.dB perc.dF
#> 1. Keep current catch 25.3 47.7 0.545 0.785 1.465 -12.2 22.0
#> 2. Keep current F 23.9 52.9 0.446 0.870 1.201 -2.7 0.0
#> 3. Fish at Fmsy 20.6 56.3 0.372 0.926 1.000 3.6 -16.7
#> 4. No fishing 0.0 77.0 0.000 1.267 0.001 41.8 -99.9
#> 5. Reduce F 25% 18.8 58.0 0.335 0.955 0.901 6.9 -25.0
#> 6. Increase F 25% 28.5 48.1 0.558 0.793 1.501 -11.3 25.0
The resulting biomass, fishing mortality and catch of the management scenarios are included in the standard plots
par(mfrow=c(2, 2), mar=c(4, 4.5, 3, 3.5))
plotspict.bbmsy(res)
plotspict.ffmsy(res, qlegend=FALSE)
plotspict.catch(res, qlegend=FALSE)
plotspict.fb(res, man.legend=FALSE)
To obtain results for several forecast horizons without having to rerun the model each time it is required to set inp$timepredc
equal to the longest horizon of interest. For example
inp <- pol$albacore
inp$timepredc <- 1991
res <- fit.spict(inp)
res <- manage(res)
Then the management table for 1990 is:
mansummary(res, ypred=1, include.unc = FALSE)
#> Observed interval, index: 1967.00 - 1989.00
#> Observed interval, catch: 1967.00 - 1990.00
#>
#> Fishing mortality (F) prediction: 1991.00
#> Biomass (B) prediction: 1991.00
#> Catch (C) prediction interval: 1990.00 - 1991.00
#>
#> Predictions
#> C B F B/Bmsy F/Fmsy perc.dB perc.dF
#> 1. Keep current catch 25.3 53.7 0.472 0.885 1.269 -4.9 5.7
#> 2. Keep current F 24.7 54.3 0.446 0.894 1.201 -3.9 0.0
#> 3. Fish at Fmsy 21.3 57.8 0.372 0.952 1.000 2.3 -16.7
#> 4. No fishing 0.0 79.1 0.000 1.303 0.001 40.0 -99.9
#> 5. Reduce F 25% 19.4 59.6 0.335 0.982 0.901 5.5 -25.0
#> 6. Increase F 25% 29.5 49.5 0.558 0.814 1.501 -12.5 25.0
and for 1991 is:
mansummary(res, ypred=2, include.unc = FALSE)
#> Observed interval, index: 1967.00 - 1989.00
#> Observed interval, catch: 1967.00 - 1990.00
#>
#> Fishing mortality (F) prediction: 1992.00
#> Biomass (B) prediction: 1992.00
#> Catch (C) prediction interval: 1991.00 - 1992.00
#>
#> Predictions
#> C B F B/Bmsy F/Fmsy perc.dB perc.dF
#> 1. Keep current catch 25.3 50.9 0.490 0.837 1.317 -10.0 9.7
#> 2. Keep current F 23.9 52.9 0.446 0.870 1.201 -6.5 0.0
#> 3. Fish at Fmsy 21.7 58.7 0.372 0.967 1.000 3.9 -16.7
#> 4. No fishing 0.0 100.4 0.000 1.652 0.001 77.6 -99.9
#> 5. Reduce F 25% 20.3 61.9 0.335 1.019 0.901 9.5 -25.0
#> 6. Increase F 25% 26.4 45.2 0.558 0.745 1.501 -20.0 25.0
catchunit
- Define unit of catch observationsThis will print the unit of the catches on relevant plots.
Example: inp$catchunit <- "'000 t"
.
dteuler
and eulertype
- Temporal discretisation and time stepTo solve the continuous-time system an Euler discretisation scheme is used. This requires a time step to be specified (dteuler
). The smaller the time step the more accurate the approximation to the continuous-time solution, however with the cost of increased memory requirements and computing time. The default value of dteuler
is \(1/16\), which seems sufficiently fine for most cases, and perhaps too fine for some cases. When fitting quarterly data and species with fast growth it is important to have a small dteuler
. The influence of dteuler
on the results can be checked by using different values and comparing resulting model estimates. If dteuler <- 1
the model essentially becomes a discrete-time model with one Euler step per year.
There are two possible temporal discretisation schemes which can be set to either eulertype = 'hard'
(default) or eulertype = 'soft'
. If eulertype = 'hard'
then time is discretised into intervals of length dteuler
. Observations are then assigned to these intervals. For annual and quarterly data dteuler = 1/16
is appropriate, however if fitting to monthly data dteuler
should be changed to e.g. 1/24
. If eulertype = 'soft'
(careful, this feature has not been thoroughly tested), then time is discretised into intervals of length dteuler
and additional time points corresponding to the times of observation are added to the discretisation. This feature is particularly useful if observations (most likely index series) are observed at odd times during the year. The model then estimates values of biomass and fishing mortality at the exact time of the observation instead of assigning the observation to an interval.
msytype
- Stochastic and deterministic reference pointsAs default the stochastic reference points are reported and used for calculation of relative levels of biomass and fishing mortality. It is, however, possible to use the deterministic reference points by setting inp$msytype <- 'd'
.
do.sd.report
- Perform SD report calculationsThe sdreport step calculates the uncertainty of all quantities that are reported in addition to the model parameters. For long time series and with small dteuler
this step may have high memory requirements and a substantial computing time. Thus, if one is only interested in the point estimates of the model parameters it is advisable to set do.sd.report <- 0
to increase speed.
reportall
- Report all derived quantitiesIf uncertainties of some quantities (such as reference points) are required, but uncertainty on state variables (biomass and fishing mortality) are not needed, then reportall <- 0
can be used to increase speed.
optim.method
- Report all derived quantitiesParameter estimation is per default performed using R’s nlminb()
optimiser. Alternatively it is possible to use optim
by setting inp$optim.method <- 'optim'
.
Pedersen, Martin W, and Casper W Berg. 2016. “A Stochastic Surplus Production Model in Continuous Time.” Fish and Fisheries. Wiley Online Library. doi:10.1111/faf.12174.
Polacheck, Tom, Ray Hilborn, and Andre E Punt. 1993. “Fitting Surplus Production Models: Comparing Methods and Measuring Uncertainty.” Canadian Journal of Fisheries and Aquatic Sciences 50 (12). NRC Research Press: 2597–2607.