Elisa Beshero-Bondar PRO
Professor of Digital Humanities and Chair of the Digital Media, Arts, and Technology Program at Penn State Erie, The Behrend College.
DigitAI will have broad applications to:
Inspect eXtensible Markup Language (XML) and schemas (coded rules governing data structures)
Identify inconsistencies and recommend improvements in markup,
(With agency): apply transformation scripts to supply new encoding and help complete unfinished projects.
This project was made possible, in part, by a seed grant from the Penn State Office of the Vice President for Commonwealth Campuses, and by Penn State funding for Undergraduate Research.
newtFire {dh} @
newtFire {dh} @
newtFire {dh} @
newtFire {dh} @
Customizable desktop AI system trained to apply the Guidelines of the Text Encoding Initiative (TEI)
1. Neo4j Knowledge Graph RAG: Attempted (with XSLT => JSON) to map XML hierarchies into a property graph of nodes and relationships for search and retrieval.
"Code Bloat" Problem! We discovered that XML dimensional relationships (attributes, namespaces, and deep nesting) expanded exponentially when converted to nodes/edges, creating a database that was less efficient than the source files.
Finalizing the graph is a problem: incomplete, difficult to update.
2. and 3. BGE-M3 & FAISS: Used for translation, indexing, and filtering of information from the RAG: works rapidly and efficiently to filter the graph relationship data.
4. Qwen2-7B Language Model
Model Context Protocol (MCP): Replaces the static Graph DB new dynamic system in which the SLM applies and adapts MCP code scripts directly with raw XML files.
Tool-Augmented Agency: Instead of "searching" a database, the model acts as an agent capable of executing XML stack tools:
XPath/XQuery: For investigating and reporting on XML code patterns + inconsistencies
XSLT: For adding new markup / transforming the original XML.
Schema-Aware Reasoning: We provide the model with "starter scripts" and schema code (describes rules / structure of an XML project): allows the SLM to navigate XML data with its original logic rather than a bloated translation.
More Efficient? Eliminates the "translation tax" of converting XML to Graph, keeping the data footprint small while increasing query precision.
By Elisa Beshero-Bondar
DigitAI for Localized TEI/XML Assistance: A Poster
Professor of Digital Humanities and Chair of the Digital Media, Arts, and Technology Program at Penn State Erie, The Behrend College.