Plan of Action (updated June 28, 2023)

High Level Method

Read the student papers and collect ground truth data (10 each)
Conversational Experiment making use of transformer LLM process for “assisted reading” of the papers
Identify entities & verify against authorities/knowledge bases
Interface for validation of each species occurrence
OCR process for other papers

Read Student Papers

We all develop a shared understanding of the papers and how they are structured (or not). Gather a list of questions that we can use to guide a conversational model.

Conversational Experiment

Build an experiment with already OCR’d set of papers based on how we identified SPOCs in step 1.

Some Options

Index all of the documents, ask for all the species occurrences across all the docs, in a list, with citation in the paper (paper ID, page). A button next to each one shows the excerpts from the paper with the evidence, highlighted. Link to review the full highlighted paper if necessary. 
Complex prompt to return dense chunks of information with species name, location, depth, habitat, etc. with explanation of the information source. 
Conversational verification, one paper at a time. Prompt for the first best guess at SPOCs from the given paper, further interrogation (memory required) to verify.

Identify entities

Our early experiments with OpenAI model to identify entities was very successful. We think this additional step will help us in the verification process by: 1. Making sure that the entities align with results of assisted reading 2. Allow us access to alternate name forms, visual verification (pictures!) and geo-location

Interface for validation

This step brings together the assisted reading with the entity (species, place, habitat, etc.) identification to help with the verification of the species occurrence.