
Now that the initial rush of excitement over Generative AI and LLMs has passed, those digging into the efficacy and sustainability of these tools are looking at topics such as the provenance of the training data. The copyright status of the material seems to be the primary issue of concern, with lawsuits pending against Open AI[1] and Nvidia[2] among others. While the lawsuits on LLM training are working their way through the courts, we wondered about something a little more clearcut and perhaps even more basic.