Prepared by ASI AI Committee
AI-Indexing is web-based indexing software developed by Ben Vagle. It employs large language model (LLM) artificial intelligence (AI) to select terms for indexing, detect duplicates, and suggest relationships. Although AI-Indexing is not yet commercially available (we tested it via a provided access code), the webpage copy presents it as a finished product.
This review will focus on the tool’s overall usability from the perspective of a professional indexer. Overall, we found that AI-Indexing’s design and its assumptions about workflows are not compatible with professional indexing workflows and appear to be based on a fundamental misunderstanding about indexing: the belief that indexing is a simple matter of plucking terms from the text and placing them in a hierarchical structure. In fact, indexing is a complex series of judgments, made not once but over and over again, and grounded in multiple contexts, including the context of the book as a whole.
Overview
The first step is for the user to upload the text to be indexed in PDF format.
Next, AI-Indexing identifies keywords in the text and presents them to the user in alphabetical order, with page number(s) and a snippet of immediate context. The criteria for the selection of terms is not clear and is not modifiable. Users can click on any instance of a keyword to view the corresponding page of the PDF in a side window. From here, the user can delete or edit the terms. (There is no option to add subheadings at this stage.) To give a sense of scale, for a test document of 31 pages, the AI selected over 300 instances of keywords to review.
Next, AI-Indexing suggests which terms might be related to each other and offers the user the choice to merge a term with a broader term, make it a subentry, keep it separate, or provide a cross-reference to a different term. These choices are mutually exclusive, meaning that (for example) the user can’t double-post. This stage is a three-step process: step 1 is labeled “duplicate detection and relationship review,” step 2 is labeled “duplicate analysis and relationship analysis,” and step 3 is also labeled “duplicate analysis and relationship analysis.” It’s not clear whether the software always sends the user through these three substeps, or whether the length of the source document might affect how many times this stage is repeated.
After this, AI-Indexing generates subheadings (“subentries”), which the user reviews through the same entry-by-entry process as the original entries. The user can edit suggested subheadings and add subheadings to entries that don’t have suggested subheadings.
Finally, the user exports the index either as a finished index in .docx format or as an .ixml file which can be imported into Cindex for further editing.
Evaluation
AI-Indexing represents indexing as a series of isolated, sequential steps. At each step, there is no obvious way to return to earlier steps, to make temporary changes, or to save or revert to intermediate versions of the index. This limits flexibility, making every decision essentially irrevocable unless the index is edited afterwards in Word or Cindex. More importantly, this prevents the standard iterative approach that professional indexers take, wherein decisions are constantly revisited or revised to reflect the growing understanding of the book’s structure as the indexer works through the text.
AI-Indexing does not facilitate working through the text. The provision of brief snippets of surrounding text was not sufficient to evaluate the term within the wider context of the book as a whole (although it did allow us to observe that the AI had indexed a number of passing mentions). While the ability to jump to the term’s corresponding page provides a larger window of context, it still assumes that the indexer will be able to gather appropriate context by visiting the entries in alphabetical order. This atomization of the text is the reverse of a professional indexer’s approach, which demands a deep understanding of the overall corpus including the interrelated strands, concepts, and themes that run throughout.
In the second stage dealing with duplicate and related terms, the different purpose(s) of the three steps was not clear. The criteria used to identify potentially related terms were not disclosed, hence are unknown (and presumably not modifiable by the indexer). The number of terms presented to the user was unwieldy; to provide a sense of the amount of work required, for our 31-page test document, AI-Indexing presented us with 91 term pairs in step 1, 71 merged terms plus an additional 17 new term pairs in step 2, and 91 merged terms plus an additional 12 new term pairs in step 3.
Because the software is cloud-based, the user must upload the text to be indexed. AI-Indexing’s website states that uploaded text is not used to “train” its own or any other LLM; as with other LLM AI products, there is no way to verify this.
At each stage of the process, suggestions must be dealt with one at a time, individually; there is no way to view a group of related entries, for example, or edit a subset of entries in concert. This is time-consuming and makes consistency in phrasing more difficult.
Although AI-Indexing advertises itself as “professional indexing enhanced by secure, intelligent automation” and as “empowering indexers with advanced tools while preserving human expertise, decision-making, and complete data ownership,” its underlying assumptions about indexing lead it to work against, rather than support, indexers’ expertise and decision-making.
Post review, the developer contacted the AI Committee to clarify that the product is still in beta. We have offered to post an updated review after the product’s release if significant changes have been made.