ABSTRACT Using data collected by Monterey Bay Aquarium Research Institute(MBARI) in Monterey Bay and employing stepwise multiple regression techniques, this study develops several equations that estimate primary productivity for Monterey Bay. The most significant equation, using integrated chlorophyll, photosynthetically active radiation (PAR), and surface phaeopigments as variables, explains 77% of the variance in primary productivity measurements. Another equation explains 70% of the variance, using the two variables, PAR and surface chlorophyll + surface phaeopigments, which are both measurable by remote sensing. Tests of these equations on another MBARI data set, which includes oceanic sampling, demonstrates the possible applicability of these equations on a regional scale. INTRODUCTION Over the past decade, there has been increased interest in estimating primary production from satellite imagery. On a global scale, primary productivity and other oceanic processes have a very significant effect on the entire carbon cycle (Barber, 1990); our understanding of the entire carbon cycle would improve by quantifying global production. On a regional scale, estimates of primary productivity would be useful because phytoplankton crops are often patchy and can have large spatial and temporal variability. Current estimates of primary production over regional areas are generated from measurements taken by sampling at a few stations and extrapolating for the larger unknown area. Thus, estimates of primary production using satellite data could increase the accuracy of assessing the total productivity of on a larger scale. Two kinds of satellites would be used to collect the data for the estimates the Coastal Zone Color Scanner(CZCS), which measures surface pigments, and the Advanced Very-High-Resolution Radiometer(AVHRR), which senses sea surface temperature. Although satellite imagery provides some relevant data and appears to be a feasible method for remotely sensing primary productivity, the current technology does not provide methods for measuring all the biologically important variables that affect the rate of carbon uptake. For example, satellites can provide data on surface pigment levels, but much of the chlorophyll, representing the photosynthesizing phytoplankton, is below the surface and distributed throughout the euphotic zone. Öther biological factors that help determine the amount of primary productivity include nutrient levels and species composition of the phytoplankton crop, neither of which can be measured using existing satellite technology. Despite the difficulties in using satellites, other data sets have been examined and equations have been created that can explain some of the variance between production and surface chlorophyll levels. These equations have explained up to 60% of the variance in primary production. Eppley et al. (1985), for example, was able to explain 58% of the variance in the Southern California Bight after regressing production against daylength, pier temperature anomaly(a measure of interannual ocean temperature variability), and pigments. This study attempts to report whether a significant amount of the variance in Monterey Bay primary production can be explained by analyzing the Monterey Bay Aquarium Research Institute data set using multiple regression techniques. In the hopes of learning more about the Monterey Bay ecosystem, selection of the variables was not constrained by satellite measurement capabilities, but rather by the data that was available. MATERIALS AND METHODS The data set used for this analysis was collected by the Monterey Bay Aquarium Research Institute(MBARI) in Monterey Bay from 5 April 1989 to 2 May 1990. The data set, part of an ongoing project at MBARI, includes 31 single¬ day cruises, with 109 stations. MBARI had four stations per cruise where primary productivity measurements were taken consistently: two offshore and two oceanic stations. (see figure 1) To determine which biological factors play a role in explaining the variance between primary productivity and surface chlorophyll in the Monterey Bay system, the analysis includes ten variables and seven groups of phytoplankton. The variables used are not necessarily measurable by current satellite technology. Temperature and salinity measurements were taken on a Seabird CTD system at several depths. For this analysis, only the surface temperature and salinity levels were used. To give an estimate of the mixed layer depth, the change in temperature between 0-40 meters is also included as a variable. Water samples for the productivity measurements taken at depths which were shown, by a secchi disk reading and calculation, to receive 100, 50, 30, 15, 5, 1, and 0.1% of the light. Those depths throughout the euphotic zone were sampled by collecting water in Niskin bottles, putting the samples into polycarbonate bottles, adding a known amount of 1C, and incubating the bottles on deck in mesh cylinders that allowed only the prescribed percent light for that depth. Äfter 24 hours, the amount of carbon uptake was measured by the radioactive tracer,1C. Using trapezoidal integration, production was calculated for the entire euphotic zone. This integrated carbon measurement value, measured in mg C/m2/day is used in this analysis as the value for primary productivity. Water samples for the chlorophyll and nutrient levels were also collected in Niskin bottles at several depths throughout the euphotic zone. The chlorophyll levels at each depth were determined fluorometrically on a Turner Designs model 10-005 R fluorometer. Surface chlorophyll and phaeopigment levels were used as variables, and, since satellites cannot distinguish between chlorophyll and phaeopigments, an additional variable combining the two was also used. In addition, the integrated chlorophyll value, which is calculated using trapezoidal integration on the individual depth chlorophyll values, was included as a variable. Because production occurs throughout the euphotic zone and not just at the surface, this integrated term provides a more accurate account of how much chlorophyll is actually present. Until a CZCS satellite is up over Monterey Bay and its estimates of surface pigment levels are normalized to ship measurements, the shipboard pigment data can represent the satellite data for the purpose of this study. The nutrients, nitrate and silicate, were analyzed on a Alpkem Rapid Flow Analysis (RFA) system using a slightly modified version of the methods of Whitledge et al.(1981) Only the surface nutrient levels were included in the analysis. Irradiance, or photosynthetically active radiation (PAR), measured in mE/m2/day, represents the amount of sunlight available to phytoplankton for use in photosynthesis during a day; for example, the cumulative PAR for a cloudy day will be less than the cumulative PAR for a sunny day of equal length. MBARI's PAR sensor, located onboard its research vessel, measured the light in volts, which can then be converted to mE/m2/day. Because the instrument only records PAR for the duration of the cruise and not for the entire day, consistent measurements of cumulative PAR for each cruise day were not available from an in situ source. Instead, onshore PAR data, recorded at the Monterey Bay Aquarium(MBA), is compared with the ship PAR data. When graphed on the same scale, the on- and offshore correlate well with each other. Because MBA PAR data was not available for four cruise days, I compare the Aquarium’s PAR sensor data with its solar irradiance data, measured in watts/m2/day. I use this to generate the regression equation: watts= -5.12 +0.47'PAR Using this equation and solar irradiance data from MBA, PAR levels for those four days are calculated. The phytoplankton for the group analysis variables were collected in Niskin bottles at those stations where productivity measurements were taken. The organisms were counted by taxonomic group, classified according to shape and size, and run through an algorithm to calculate the actual ugC/liter/group. The groups included separate classifications for the following: picoplankton, ultraplankton, nanoplankton, heterotrophs, centric diatoms, pennate diatoms, both centric and pennate diatoms, and all groups as a single variable. A correlation matrix with all the variables is used to identify those variables which correlate well with primary production. Stepwise multiple regressions on several models determine which variables explain the most variance in primary production. Because both carbon and chlorophyll measurements follow a log- normal distribution, these variables are used in the regression after a logarithmic transformation (Campbell, 1987). I test the most significant models by comparing the observed primary production from another regional data set with the expected production generated by the regression equations. The data set used in this test was collected by MBARI in Monterey Bay and the waters of the adjacent California Current following the same methods described above. Figure 2 shows the stations used in this second data set. PAR data collected by MBA was also used in this second data set. RESULTS The correlation matrices indicate which variables correlate well with primary productivity (see table 1: all correlation coefficients mentioned are in this table). All of the well-correlated variables, however, cannot be used in the same model because some of them represent the same variance of primary production. For example, surface chlorophyll and integrated chlorophyll correlate well with each other; if used in the same model, the variance of primary production they explain would overlap and one could incorrectly conclude that one of the variables was not important. Delta temperature, a variable with a relatively high r when correlated with primary production, is a significant variable in this system. Because of incomplete temperature data on several cruises, data points for delta temperature do not span the entire data set. To maintain the largest possible sample size, delta temperature, along with other variables with small sample sizes are not entered all the models. Even though these variables are important, a larger sample size increases the significance of the models To allow for the missing data, I analyze several multiple regression models, all shown in table 2. Ten models are presented in table 2. Of the ten, Model 10, which includes In(surface chlorophyll+ surface phaeopigments), PAR, nanoplankton, and pennate diatoms as the significant independent variables, explains the highest amount of variance, with r°= 0.875. Although this regression is significant at the 0.0001 level, the small n'of the plankton group data may make this model less reliable. Model 5, with an r== 0.772, explains about 10% less of the variance than model 9, yet has a much larger sample size(n= 94). Note that model 2, using only PAR and In (surface chlorophyll + surface phaeopigments), still explains over 70% of the variance and, more importantly, both variables are obtainable directly from a satellite data set. In addition to explaining large amounts of the variance, model 2 and model 4 also incorporate variables that are available in the second MBARI data set. To test the applicability of these models, I compare observed primary production from the second data set with expected primary productivity measurements generated by model 2. This yields an r=0.54, significant at the 1% level(see figure 3). Using the model 4 regression equation, the correlation between observed and expected increases to r= 0.77, p«0.01 (see figure 4). DISCUSSION From table 2, one can see that the major variables, PAR, In(integrated chlorophyll, and In(surface chlorophyll+ surface phaeopigments), explain comparable amounts of variance throughout the table. This observation, along with the significance levels, implies that the MBARI data set may be representative even at the smaller sample size. A thorough analysis of the regression models yields far more information than the regression equations. With large sample sizes and r* values over 0.60, the first three regression models include PAR, a measure of daylength and available light levels, and pigment concentrations. These appear in each model as the most significant variables. Because the amount of available light can be a limiting factor in production, the inclusion of PAR as a major variable in all the models is understandable. Chlorophyll measurements, which are also significant in every model, are a measure of biomass via the chlorophyll pigments present in all photosynthesizing organisms. Clearly, a correlation between production and phytoplankton biomass is expected. Not only are both PAR and pigments the most significant variables in these models, they are also measurable by satellites. Although phaeopigments are only associated with chlorophyll molecules and do not play a role in photosynthesis, both molecules have almost the same absorption spectrum and thus a satellite senses the sum of both pigments(Smith and Baker, 1982). To create For an equation comprised of variables measurable by satellite, this combined term is useful. Further, model 2, with surface chlorophyll plus phaeopigments and PAR, explains 9% more variance than model 1 which uses only surface chlorophyll. Model 3, using integrated chlorophyll, yields a slightly higher r2 than either model 2 or model 1. Photosynthesizing phytoplankton are distributed throughout the euphotic zone; because integrated chlorophyll measurements provide a value for the entire euphotic zone, not just the surface, it is expected that integrated measurements would have a higher correlation with primary productivity than surface values. Model 4 also suggests that surface phaeopigments are a significant variable in explaining an additional fraction of the variance. Table 1 shows a correlation coefficient between surface phaeopigments and primary productivity of r=.528, almost as high as the surface chlorophyll correlation. It is unclear, from a biological standpoint, why surface phaeopigments would correlate so well with primary productivity. A possible explanation for the appearance of phaeopigments in this model might be simply because phaeopigments are consistently present in much smaller concentrations, and therefore their measurements are more sensitive to fluctuations. Because satellites cannot measure phaeopigments or integrated chlorophyll directly, this model’s precision, when applied to satellite data, may be less than model 2. Knowing the precise surface chlorophyll values, however, one can estimate the integrated chlorophyll value to within 10% accuracy. Satellites cannot, unfortunately measure surface chlorophyll accurately; their precision level is approximately 35%. (Platt et al., 1988) Models 5 and 6 introduces four new variables, all of which represent important biological information that can affect primary production. The results, however, show that surface salinity, temperature, nitrate, and silicate explain insignificant amounts of the variance in this system. Although salinity does not appear as a significant variable within these regression models, it does correlate well with primary production(r= 0.56). In addition, salinity correlates well with PAR, with an r= 0.61. PAR is, among this array of variables, the best seasonal marker; a correlation with salinity then could indicate that water mass movement in Monterey Bay is seasonal. Surface temperature, while presented as a significant variable in other models (Balch et al., 1989) does not explain a significant amount of the variance in this system. In Monterey Bay, surface temperatures range only from 10-13 degrees C. This explains the low level of correlation between primary production and temperature (r=-0.007.), because in this system, temperature is essentially a constant. This illustrates why creating global equation to estimate primary production is not feasible; the factors involved in regulating primary production are different for different oceanic systems. Surface silicate and nitrate levels are also insignificant in these regression equations. This result does not mean that there is definitely no relationship between primary production and nutrient levels. Nutrients are in fact vital for phytoplankton growth; blooms are common after upwelling events, which bring up nutrient-rich water. Considering the biology of phytoplankton and the frequency of upwelling, the relationship between primary production and nutrient levels may be non-linear. My regressions can only account for linear relationships, and consequently present nutrients as insignificant. Because upwelling is so important on the Central California coast, including some index of upwelling into these models might improve their accuracy. Winds affect the rate of upwelling and the amount of mixing; using wind as a variable and satellite wind data could increase the amount of variance in primary productivity (Eppley et al., 1987) Models 7 and 8 show delta temperature as a significant variable. Although the addition of delta temperature does decrease the sample size, the regressions are still significant. Delta temperature, or the change in temperature from 0-40 meters, is a measure of the mixed layer depth. An increase in delta temperature shows the thermocline lowering and the amount of mixing increasing. As mixing increases, the phytoplankton cells mix lower into the euphotic zone, receive less light, and consequently primary productivity decreases. From a biological perspective, it is logical that delta temperature explains a portion of the variance in primary productivity. 11 Some of the plankton group data, shown in models 9 and 10, explain significant amounts of the variance. As with delta temperature, the plankton data reduces the data set available for analysis. Despite this considerable reduction in sample size, models 9 and 10 are still significant at the 5% level. In both models, pennate diatoms are included as significant variables. Nanoplankton are only included in model 10. The inclusion of pennate diatoms in this model is not supported by the correlation matrices. While the correlation coefficient between primary production and pennate diatoms is not even significant, the coefficient between primary production and centric diatoms is r= 0.80, p—0.0001. Nanoplankton, which also correlates well with primary productivity, has a high correlation with centric diatoms and shows no significant correlation with the pennate diatoms. Considering these overlapping correlations, the unanticipated inclusion of pennate over centric diatoms can be explained. The nanoplankton-explained variance is similar to the variance explained by the centric diatoms that centrics become insignificant in a model which already includes nanoplankton. Without doubt, the amount of phytoplankton present in an area will directly affect the primary production of that area. Because the raw population data on each group is run through an algorithm that converts these organisms to ugCsliter/group, this data shows how many of each group were present at the site of the productivity measurements. Variation, if any, in how much production each phytoplankton group is capable of generating is not provided from this data unless primary production is correlated directly to the amount of carbon in phytoplankton. More physiological information is necessary to analyze these results accurately. When examining graphs of individual correlations between primary productivity and specific variables, several points show a marked overestimation of primary production. Figure 5, plotting primary production and PAR" integrated 12 chlorophyll, shows several points far off the regression line. The composite variable, PAR'integrated chlorophyll, is used because it reflects more biological information than either individual variable and shows these outliers most clearly. Other studies (Platt et al., 1988) have used similar composite variables. The points specified on figure 5 are summer cruise data points from stations CI and HI, most showing overestimated productivity. A possible explanation for the overestimation of primary production at these stations is that both Cl and HI are offshore stations. while the other two stations used by MBARI are oceanic (see figure 1). The turbulence of an offshore station, shallow depth at Hl, and river input at CI may all be factors in increasing the amount of suspended sediment at these stations. Because suspended sediment blocks light, the true depth of the euphotic zone (defined by the 1% light level) is lowered. Primary productivity measurements are calibrated by the integrated chlorophyll measurements. Under these conditions. productivity could be overestimated because, if the phytoplankton are not reached by the light, they cannot photosynthesize and contribute as much production as calculated. Balch et al.(1989) used this same hypothesis to explain scatter in the Southern California Bight Study data set. Why this phenomena of overestimated primary production as a result of increased suspended sediment would only occur during the summer is still unclear. When these summer points are edited, the regression improves only slightly (see figure 6). Further, because some summer data points do not follow this pattern, more data collection is necessary to prove or disprove this hypothesis. Despite the unexplained scatter in the data set, these Monterey Bay models explain more variance than other models formed for other regions. Better sampling techniques may be one possible explanation for these improvements in r2. MBARI uses acid-rinsed Niskin bottles fitted with non-toxic silicon tubing for improved productivity measurements. 13 Another advantage of these models, developed using only Monterey Bay data, is that they generate reasonable estimates of primary production in the nearby ocean area as well. The larger area an equation represents, the more useful it will be when applied to satellite data. Figure 4 shows how well the model 4 equation, when applied to the second MBARI data set, calculates expected primary production. Increasing the data set and incorporating wind data may improve these models even further. Even without these improvements, estimating primary productivity in Monterey Bay appears possible. Acknowledgements- I would like to thank my advisor, Dr. Francisco Chavez, for his helpfulness and patience, and everyone else at MBARI who contributed programming time, answers to silly questions, and general moral support to me and my project throughout the quarter. BIBLIOGRAPHY Balch, W.M., M. Abbott, and R.W. Eppley (1989) Remote sensing of primary production-l. A comparison of empirical and semi-analytical algorithms, Deep¬ Sea Research, 36, 281-295. Barber, R.T. (1990) Ocean productivity and global carbon flux, preprint volume of the Symposium on Global Change, Special Sessions on Climate Variations and Hydrology. Campbell, J.W. (1987) Biological processes in the upper ocean: nature and consequences of lognormal variability, (abstract) EOS, 68, 1696. Eppley, R.W., E. Stewart, M.R. Abbott, and U. Heyman (1985)Estimating ocean primary production from satellite chlorophyll: Introduction to regional differences and statistics for the Southern California Bight, Journal of Plankton Research, 7(1), 57-70. Eppley, R.W., E. Stewart, M. Abbott, and U. Öwen (1987) Estimating ocean production from satellite-derived chlorophyll: insights from the Eastropac data set, Oceanologica Acta., Proceedings International Symposium on Equatorial Vertical Motion, 6, 109-113. Platt, T., S. Sathyendranath, C. Caverhill and M. Lewis (1988) Ocean primary productivity and available light: further algorithms for remote sensing, Deep-Sea Research, 35(6), 855-879. Smith, R.C. and K.S. Baker (1982) Oceanic chlorophyll concentrations as determined by satellite (Nimbus-7 Coastal Zone Color Scanner), Marine Biology, 66, 269- 279. Figure Legend Figure 1. A map of Monterey Bay showing the distribution of stations used by MBARI in the first data set. Data from these stations is used to create the regression equations. Figure 2. A map showing the distribution of stations used by MBARI in the second data set. Data from these stations is used to test the equations created by the multiple regression techniques. An 'X" shows stations included in the test data set. An "O" shows those stations where additional data is available, yet not included in this study. Figure 3. Plot of observed versus expected primary productivity.. Observed values are from the second MBARI data set and the expected values are generated using the model 2 regression equation, int carbon= 5.72 + 0.00029'PAR + 0.33'In(surface chl + surface phaeo) The correlation coefficient, r=0.54 is significant at the 1% level. Figure 4. Plot of observed versus expected primary productivity. Observed values are from the second MBARI data set and the expected values are generated using the model 4 regression equation, int carbon= 3.54 + 0.00024*PAR + 0.65*In (int chl) + ).063'surface phaeo The correlation coefficient, r=0.77, is significant at the 1% level. Figure 5. Plot of primary production in Monterey Bay and the composite variable, PAR multiplied by integrated chlorophyll. The numbers in parentheses by certain data points, indicate the Julian day on which the data was collected and the station at which the measurements were made. Only the data from summer cruises at stations HI and CI are labelled. Figure 6. Plot of primary production in Monterey Bay and the composite variable, PAR multiplied by integrated chlorophyll. The labelled points shown in figure 5 are edited out, increasing the correlation coefficient from 0.81 to 0.83. For clarity, a logarithmic scale is used to spread out the clustered points and to show the relationship more clearly. Table Legend Table 1. Correlation coefficients mentioned in the text. Primary productivity, which is the integrated carbon value, is abbreviated "pp". The significance levels and the sample size are also given. Table 2. Lists the ten multiple regression model discussed. Only the independent variables are shown. Integrated carbon is the dependent variable in each model. The partial re indicates how much variance is explained by each variable and the model r shows a cumulative total, with an asterisk denoting the total r2. The r* values can also be read as percentages of the variance in primary productivity explained. 2 . 90 - 2 ... -- 2. Z- Figue c: 2 — 2 Figuse 2 X O - t 9 8 1 — Norh Ltce X X ) X X X X 27 X Fiquse 3 Observed Prim. Pord. In (mg C/m2/day) O 1 — — — 1 I Od.. - +0 C o 8 N Figure + Observed Prim. Prod. In(mg C/m2/day 8 a O U Fiaure Primary production (mg Cm2/egy) O — O — Figure 6 Primary Production (mg C/m2/day) ——— O8 O 1 - O O — U able Selected Correlation Coefficients Variable-Variable Probp pp-schl 0.571 o.ooo1 pp-schlsphaeo 0.629 0.oo01 pp-sphaeo 0.! O.oo01 0.735 pp-int chl o.o001 pp-PAR 0.603 0.oo01 -0.007 pp-surface temp 0.9571 (ns pp-surface salinity 0.556 O.0001 pp-surface nitrate -0.086 0.5127 (ns pp-surface silicate -0.131 0.3127 (ns) pp-delta temp 0.544 0.ooo1 0.7 pp-all groups 0.o001 pp-autotrophs 0.718 o.o001 op-heterotrophs O.3 0.012 pp-diatoms 0.666 O.o001 0.797 pp-centric diatoms o.oo01 pp-pennate diatoms 0.136 0.3748 (ns) pp-ultraplankton 0.7 O.0001 0.778 pp-nanoplankton 0.O001 -0.161 pp-picoplankton 0.2921(ns) 0.9 schl-schlsphaeo O.o001 schl-int chl 0.792 0.oo01 schl-autotrophs 0.616 o.0001 schl-diatoms 0.69. O.o001 schl-pennate diatoms O.494 o.oo01 schl-centric diatoms 0.608 0.O001 surface salinity-PAR 0.608 O.O001 diatoms-pennate diatoms 0.691 O.0001 diatoms-centric diatoms 0.894 0.0001 0.0007 delta temperature-PAR 0.461 delta temperature-int chl 0.288 0.0401 0.700 0.oo01 nanoplankton-centrics nanoplankton-pennates 0.091 0.551 (ns) 93 93 93 45 45 45 45 45 45 45 45 93 45 45 45 61 51 tat Model 6 10 Stepwise multiple regressions Variables Partial Model O.448 PAR 0.448 0.61* schl 0.162 0.557 0.55 PAR 0.701* 1n(schl-sphaeo) 0.145 0.! 0.53 ln(int chl) 0.75* 0.23 PAR 0.555 0.555 ln(int chl) PAR 0.207 0.762 0.772* sphaeo 0.0102 PAR 0.621 0.621 0.774* 1n(schlsphaeo) 0.153 ——— ——— surface temp ——— ——— surface silicate surface nitrate ——— ——— ——— ——— surface salinity 0.493 0.492 ln(int chl) 0.779* PAR 0.286 surface nitrate ——— ——— ——— ——— surface silicate ——— ——— surface salinity ——— ——— surface temp PAR 0.471 0.471 ln(int chl) 0.73 0.26 7596* delta temp 0.0288 PAR 0.616 0.616 0.7 1n(schlsphaeo) 0.1174 0.767* delta temp 0.033 0.5 0.558 ln(int chl) 0.77 0.213 nanoplankton 0.091 0.862 PAR 0.875* pennate diatoms 0.0128 ——— ——— centric diatoms ——— ——— picoplankton ——— ——— heterotrophs ——— ——— ultraplankton 0.547 1n(schlsphaeo) 0.547 0.792 PAR 0.245 nanoplankton 0.051 0.843 0.859* pennate diatoms 0.016 ——— ——— centric diatoms ——— ——— picoplankton ——— ——— ultraplankton ——— ——— heterotrophs denotes final r for each model PTODF 0.oo01 0.oo01 0.oo01 0.oo01 0.oo01 0.oo01 0.ooo1 0.oo01 0.0465 O.o001 o.o001 ns ns ns o.oo01 o.oo01 ns 0.oo01 0.oo01 0.0114 O.oo01 O.O001 0.0106 O.oo01 0.ooo o.0001 0.0498 ns ns ns 0.oo01 O.O001 O.O007 0.0394 ns ns ns 98 98