Results

import pandas as pd

Option 1

Here are the results of testing out the 4 different chain types for Retrieval QA.

  • Refine
  • Map Reduce
  • Re-rank
  • Stuff

Based on the results, Refine and Stuff performed better than reduce and re-rank in terms of accuracy.


#| label: tools-test
#| fig-cap: >
#|  Comparing results for initial questions
#|  across the three tools.
df = pd.read_excel('Recording of Results on Test Questions (1).xlsx', nrows=5, usecols=[0,1,2,3,4,5,6,7,8,9,10,11,12,13], sheet_name = "Option #1 Exp - fhl_Charifson")
df.style.set_table_styles([dict(selector="th",props=[('max-width', '200px')])])
left_aligned_df = df.style.set_properties(**{'text-align': 'left'})
display(left_aligned_df)
  Paper RetrievalQA - Chain_Type What is this paper about? Summarize the paper concisely with reference to materials and methods. Write a one sentence summary of the purpose of the paper Terms that may be used to identify an observation include “in the field”, “this study”, “observed”, “taken”, “collected”, “sampled”, “collection”, “seen”, “harvested”, “found”, etc. Does the paper include one or more observations? Does this paper contain observational or experimental research conducted in the natural environment or with organisms collected in nature? What are the scientific names of the species mentioned in this paper? Does the paper mention where the species were observed or collected, and if so, what locations are given? Can you give a more specific location? Are any coordinate locations given in latitude / longitude, and if so, what are they? In what habitat were the species found? Does the paper mention a year, date and/or time that species were collected or observed, and if so, what was mentioned? Are there any maps, figures, tables or diagrams in the paper?
0 fhl_2014_Charifson_34622.pdf; Snail Predation by Hemigrapsus nudus and Hemigrapsus oregonensis: Sex and Species Differences in Chela Size Refine the paper is still primarily focused on the relationship between carapace width and propus measures in H. nudus and H. oregonensis crabs. The additional context provided describes consumption rates by individual H. nudus crabs, which is not directly related to the content of the paper.' The paper utilized SMA regressions to examine the relationship between carapace width and propal height in Hemigrapsus species. The relationship was analyzed using a line of best fit from SMA regression, as shown in Figure 1. Descriptive statistics for female and male H. nudus and H. oregonensis can be found in Table 1a.' The purpose of the paper is to analyze the relationship between carapace width (CW) and propus measures (PH and PW) in different species of crabs, specifically examining the differences in propal width: carapace width ratio between sexes and species, with the sex factor being statistically significant while the species factor was insignificant, and a significant interaction between the two factors.' The additional context provided does not directly relate to the question at hand, which is whether the paper includes observations. Therefore, the original answer remains valid and does not need any refinement.' Based on the additional context provided, it is clear that the paper contains experimental research conducted in the natural environment or with organisms collected in nature. The mention of logistical issues surrounding the experimental design, the need for more crabs, and the acknowledgment of permission to collect organisms indicate that the research involved conducting experiments with live organisms in their natural habitats. Based on the additional context provided, the scientific names of the species mentioned in the paper are:\n\n1. Hemigrapsus nudus (female and male)\n2. Hemigrapsus oregonensis (female and male)' The new context provided does not mention anything about the locations where the species were observed or collected. Therefore, the original answer remains the same.' Thank you for providing additional context. However, the new information does not specify the location of the study. The references you provided are book titles and do not specify a particular study or location. Therefore, the original answer remains the same: the specific location where this study took place is not mentioned. The new context provided does not contain any coordinate locations given in latitude/longitude. Therefore, the original answer remains the same.' The new context provided does not give any information about the habitat in which the species were found. Therefore, the original answer remains unchanged. Based on the new context provided, the paper does not mention a specific year, date, or time when species were collected or observed. The information provided in the new context is related to the statistical analysis of propal width: carapace width ratio between sexes and species of crabs. It does not provide any temporal information regarding the collection or observation of the species. Therefore, the original answer remains the same.' Based on the additional context provided, there is no mention of any maps, tables, or diagrams in the paper. However, there are figures in the paper. Figure 1, titled "Relationship of carapace width and propal height in Hemigrapsus," shows the relationship between carapace width and propal height in different species and genders of Hemigrapsus. The figure includes subfigures (a, b, c, d) for female and male Hemigrapsus nudus and Hemigrapsus oregonensis. Additionally, Table 1a provides descriptive statistics related to these figures.
1 nan Map Reduce Based on the provided text, it is difficult to determine the exact topic of the paper. However, it appears to be discussing the consumption rates of individual H. nudus (presumably a type of crab) and their consumption of snails. The text mentions the mean consumption rates of female and male crabs, as well as the fact that some individuals did not consume snails.' The paper analyzed the relationship between carapace width and propal height and width in two species of crabs, H. nudus and H. oregonensis. The authors used a two-way ANOVA to compare the differences in cw:ph and cw:pw. They conducted simple moving average (SMA) regressions and provided regression equations and R-squared values for each species and gender. The paper also included a figure showing the relationship between carapace width and propal width in different species of Hemigrapsus crabs, using a line of best fit from a SMA regression analysis. Descriptive statistics for the data can be found in Table 1a and 1b.' The purpose of the paper is to analyze the relationship between carapace width and propal height/width in different species of crabs, as well as the differences in these ratios between sexes and species.' Based on the provided portions of the document, it does not appear to include any specific observations.' Based on the provided information, it is not possible to determine whether the paper contains observational or experimental research conducted in the natural environment or with organisms collected in nature.' The scientific names of the species mentioned in this paper are Hemigrapsus nudus and Hemigrapsus oregonensis. The provided portion of the document does not mention anything about the locations where the species were observed or collected.' nan There is no mention of any coordinate locations or latitude/longitude in the given portion of the document.' The given portion of the document does not provide any information about the habitat in which the species were found.' No, the given portion of the document does not mention any specific year, date, or time related to the collection or observation of species.' There is no information provided in the given portion of the document about the presence of maps, figures, tables, or diagrams in the paper.'
2 nan Map Re-Rank This document is about biometry and ecological character displacement.' This document provides a figure showing the relationship between carapace width and propal width in different species of Hemigrapsus crabs. The figure includes data for female and male H. nudus and H. oregonensis. The figure also includes a line of best fit from a simple linear regression analysis. For more detailed information, refer to Table 1b for descriptive statistics. This document does not provide enough information to determine the purpose of the paper.' This document does not answer the question' This document contains experimental research conducted with organisms collected in nature.' h. nudus' This document does not mention where the species were observed or collected.' nan No, there are no coordinate locations given in latitude/longitude in the context.' This document does not answer the question' No, the paper does not mention a year, date, or time that species were collected or observed.' Yes, there are figures in the paper. Figure 2 shows the relationship of carapace width and propal width in hemigrapsus. It includes four subfigures: a) female H. nudus, b) male H. nudus, c) female H. oregonensis, and d) male H. oregonensis. The line of best fit from sma regression is also shown. Table 1b provides descriptive statistics.
3 nan Stuff This paper is about the analysis of carapace width and propus measures in different species of crabs, specifically H. nudus and H. oregonensis. It also discusses the consumption rates of individual H. nudus crabs. The paper analyzed the relationship between carapace width and propus measures in H. nudus and H. oregonensis crabs using a two-way ANOVA. The authors conducted SMA regressions to determine the relationship between carapace width and propal height and width, and provided descriptive statistics and line of best fit graphs for each species and sex. The purpose of the paper is to analyze the relationship between carapace width and propus measures in different species of crabs and to compare the differences between sexes and species.' Based on the given context, it is not clear whether the paper includes one or more observations.' Based on the given context, it is not clear whether the paper contains observational or experimental research conducted in the natural environment or with organisms collected in nature. The context provided does not explicitly mention any observations being made or data being collected.' The scientific names of the species mentioned in this paper are Hemigrapsus nudus and Hemigrapsus oregonensis.' Based on the given context, it is not mentioned in the paper where the species were observed or collected. No specific locations are given.' nan No, there are no coordinate locations given in latitude/longitude in the provided context.' The species were found in water and in finer sediment.' No, the paper does not mention a year, date, or time that the species were collected or observed.' Yes, there are figures in the paper. Figure 1 shows the relationship between carapace width and propal height in different species and genders of Hemigrapsus. Figure 2 shows the relationship between carapace width and propal width in the same species and genders. There are also tables mentioned in the context, such as Table 1a and Table 1b, which provide descriptive statistics.'

Option 2

Here are the result from testing the ConversationalRetrievalQA with “Stuff” chain type with and without Contextual Compression.

One challenge with retrieval is that usually you don’t know the specific queries your document storage system will face when you ingest data into the system. This means that the information most relevant to a query may be buried in a document with a lot of irrelevant text. Passing that full document through your application can lead to more expensive LLM calls and poorer responses.

Contextual compression is meant to fix this. The idea is simple: instead of immediately returning retrieved documents as-is, you can compress them using the context of the given query, so that only the relevant information is returned. “Compressing” here refers to both compressing the contents of an individual document and filtering out documents wholesale.

df = pd.read_excel('Recording of Results on Test Questions (1).xlsx', nrows=5, usecols=[0,1,2,3,4,5,6,7,8,9,10,11,12,13], sheet_name = "Option #2 Exp - fhl_Charifson")
df.style.set_table_styles([dict(selector="th",props=[('max-width', '200px')])])
left_aligned_df = df.style.set_properties(**{'text-align': 'left'})
display(left_aligned_df)
  Paper ConversationalRetrievalQA - Chain_Type What is this paper about? Summarize the paper concisely with reference to materials and methods. Write a one sentence summary of the purpose of the paper Terms that may be used to identify an observation include “in the field”, “this study”, “observed”, “taken”, “collected”, “sampled”, “collection”, “seen”, “harvested”, “found”, etc. Does the paper include one or more observations? Does this paper contain observational or experimental research conducted in the natural environment or with organisms collected in nature? What are the scientific names of the species mentioned in this paper? Does the paper mention where the species were observed or collected, and if so, what locations are given? Can you give a more specific location? Are any coordinate locations given in latitude / longitude, and if so, what are they? In what habitat were the species found? Does the paper mention a year, date and/or time that species were collected or observed, and if so, what was mentioned? Are there any maps, figures, tables or diagrams in the paper?
0 fhl_2014_Charifson_34622.pdf; Snail Predation by Hemigrapsus nudus and Hemigrapsus oregonensis: Sex and Species Differences in Chela Size Stuff This paper is about the differences in chela size between Hemigrapsus nudus and Hemigrapsus oregonensis and how it relates to snail predation.' The paper discusses the limitations of a previous study due to logistical issues and lack of replication. It suggests that future studies should include more crabs, exclude non-feeding crabs, and have longer trial periods. Additionally, it recommends using multivariate and geometric morphometric methods to quantify claw shape instead of ratios. The purpose of the paper is to address the limitations of a previous study and propose improvements for future studies on crab feeding behavior and claw shape.' Yes, the paper includes one or more observations.' This paper contains both observational and experimental research conducted in the natural environment or with organisms collected in nature.' Hemigrapsus nudus and Hemigrapsus oregonensis.' The paper mentions that the species were observed and collected in Elkhorn Slough, Monterey County, California.' (INCORRECT - Pulling from Citation Section) The study was conducted in a rural area in northern California.' No, there are no coordinate locations given in latitude/longitude in the provided context.' The species Lacuna vincta were found in False Bay and Dead Man Cove. The species Littorina scutulata were found in the rocky intertidal zone of Fourth of July Beach. The paper does not mention a year, date, or time that species were collected or observed.' Yes, there are figures in the paper.'
1 nan Stuff w/o Contextual Compression This paper is about the mechanics of fiddler crab claws and the need for future studies to improve experimental design and methodology.' The paper discusses the need for future studies to address logistical issues and increase replication. It also suggests using multivariate and geometric morphometric methods to quantify claw shape instead of ratios. The acknowledgements section mentions individuals and organizations that provided advice and financial support for the study.' The purpose of the paper is to discuss the logistical issues and limitations of a previous study on fiddler crab claw shape and suggest improvements for future studies.' "I don't know." This paper contains experimental research conducted with organisms collected in nature.' Hemigrapsus nudus, Hemigrapsus orogenensis, Littorina scutulata, Lacuna vincta.' The paper mentions that the study was conducted at Friday Harbor Laboratories, University of Washington, Friday Harbor, WA 98250.' No, the specific location is not provided in the given context.' No, there are no coordinate locations given in latitude/longitude.' The species were found in the rocky intertidal zone.' No, the paper does not mention a year, date, or time that species were collected or observed.' Yes, there is a figure in the paper.'
2 nan Stuff w/ Contextual Compression This paper is about the differences in chela size between Hemigrapsus nudus and Hemigrapsus oregonensis and how it relates to snail predation.' The paper discusses the limitations of a previous study due to logistical issues and lack of replication. It suggests that future studies should include more crabs, exclude non-feeding crabs, and have longer trial periods. Additionally, it recommends using multivariate and geometric morphometric methods to quantify claw shape instead of ratios. The purpose of the paper is to address the limitations of a previous study and propose improvements for future studies on crab feeding behavior and claw shape.' Yes, the paper includes one or more observations.' This paper contains both observational and experimental research conducted in the natural environment or with organisms collected in nature.' Hemigrapsus nudus and Hemigrapsus oregonensis.' The paper mentions that the species were observed and collected in Elkhorn Slough, Monterey County, California.' "I don't know." No, there are no coordinate locations given in latitude/longitude.' The species Lacuna vincta were found in False Bay and Dead Man Cove. The species Littorina scutulata were found in the rocky intertidal zone of Fourth of July Beach.' The paper does not mention a year, date, or time that species were collected or observed.' Yes, there are figures (Figure 1 and Figure 2) and tables (Table 1A and Table 1B) in the paper.'