Walking Through Big Data: One Historian’s Path | History of Education Society, 6 November 2015, St. Louis, MO

Historians of education are known for their intensive, close examinations of documentary evidence.Â Through careful â€œdocument analysis,â€ one identifies patterns, anomalies, curious players, key events, policy, and change over time all in that which is deemed to be â€œeducational.â€Â Of late, the trend toward publishing what might be called case studies or microhistories has manifested.Â This includes my own work on American Indian education.Â In the broader fields of history and literary criticism tools for analyzing large bodies of textual data have become prominent in discussions about the digital humanities.Â Our tiny subfield has just a handful of people who have been experimenting with such tools. Â Today, I will walk you along my meandering path in an attempt to understand the terrain of 19th century childrenâ€™s magazines in terms of what they published and, by extension, what their publishers hoped children and families would learn.

I began along this path as I was finishing research for my book Lessons from an Indian Day School several years ago.Â As I did a scattershot search in the serial set for anything related to Pueblo Indians, I came across several articles that recounted histories from a couple of Pueblo Indian communities.Â Charles Lummis, the notableâ€”and in the BIAâ€™s eyes, notoriousâ€”adventurer, writer, librarian, and editor, published a number of these stories in St. Nicholas Magazine.Â This magazine, in fact, had one of the longest runs of any childrenâ€™s periodical in the US, from 1873-1939; and, it had a significant average monthly circulation at 70,000 copies distributed.¹Â I tabled these stories for a while.

When I went back to them, I began to read each story deeply, and I was captivated by the ethnographic depth with which Lummis recounted the stories and the scenes in which they were told to him when he lived at Isleta Pueblo, not far from Albuquerque, from 1890-1892.Â
I paid close attention to the coyote stories, as Coyote among most, if not all, Pueblo Indian communities is a trickster, and thus a teacher.Â I went and studied Lummisâ€™ papers, looking at many of his records from the years surrounding his time at Isleta and his subsequent work with Adolph Bandolier.Â I learned that Lummis was devoted to the Isleta community and had worked against the BIA to aid Isletan parents in their right to determine the course of formal education for their children.²Â I also learned that the larger questions I had about colonization as an educative phenomenon had reached their limits with the method of exegesis.

To examine how colonization might have been instructive (or not) to those who experienced it in the American West, I needed a broader approach.Â I needed to look at a much wider and deeper body of evidence.Â And, I needed to be prepared for what I might or might not find. I had found the Lummis articles in the HathiTrust repository. I had downloaded each page individually in order to look at what he had written as an institutional subscription was necessary to download a yearâ€™s worth of issues at a time. I felt like I had walked myself right into a rut that was ten years deep. And then, the HathiTrust opened up the HathiTrust Research Center, which allows researchers to create worksets of materials in the repository and run a number of algorithms to analyze the textual data. My rut was flattened.

What shape would my worksets take? In playing around with the tools for researchers, I found that I could identify volumes and issues of periodicals in the HathiTrust and cross-reference those with the Digital Public Library of America and WorldCat. I found, for example, the HathiTrust has nearly the full run of St. Nicholas Magazine, but that it has only a portion of the Youthâ€™s Companion. As I began working my way down the list of prominent childrenâ€™s magazines in the nineteenth centuryâ€”those that ran the longest or had a significant circulationâ€”I compiled worksets and ran what seemed to be initially elementary analyses of each: deployable word counts.

As Lummisâ€™s articles piqued my interest in how non-Natives recounted significant tribal histories with a wealth of information about their environments and natural history, I hoped that there would be many such examples of this co-optation. And, I had hoped that I would be able to identify patterns in what I was seeing. So, I requested initially short lists, beginning with the top 2000 words for a workset (or magazine collection). I found that I had to edge up the word counts to 7000 to ensure that â€œIndian,â€ â€œIndians,â€ â€œsavage,â€ â€œnative,â€ and the like actually appeared in the word counts. Just with these word counts, I found that the Youthâ€™s Companion had a much greater frequency of such words than St. Nicholas Magazine and Peter Parleyâ€™s and Merryâ€™s Museum magazines. Why was this the case, and would topic models show something that supported or refuted this seeming buried indicator of interactions with American Indian communities in what is now the continental US?

And what, by the way, is topic modeling? Megan Brett, who published an introduction to topic modeling in the Journal of Digital Humanities in 2012, writes, â€œTopic modeling is a form of text mining, a way of identifying patterns in a corpus. You take your corpus and run it through a tool which groups words across the corpus into â€˜topicsâ€™.â€³ In other words, a â€œtopicâ€ is a group of words that appears repeatedly over a constellation of documents.⁴ A software program called MALLET, which was developed by Andrew McCallum at the University of Massachusetts-Amherst, is the tool most humanists engaging in topic modeling use, including the HathiTrust. Though one can download MALLET for free through U-Mass Amherst, itâ€™s not intuitive to use. There is an interface called the Topic Modeling Tool that one can download from GitHub to make MALLET more usable. Or, you can try out a couple of other tools like the HathiTrustâ€™s internal algorithm for topic modeling that will run your worksets as you see fit. Or, there is Paper Machines,⁵ a Zotero plug-in that spits out beautiful stream graphs. It was important that I find a tool that I can use with a relatively intuitive interface for a couple of reasons: 1) I donâ€™t code or understand the inner workings of the backend software my machine runs on, 2) Iâ€™m new to topic modeling, so Iâ€™m most interested in understanding how the process works and how it can help me most efficiently direct my energies in reading textsâ€”or their reliefsâ€”closely and contextually.

Initially I began tinkering with Paper Machines. I have used Zotero for a number of years, and I had, or could get, loads of pdfs. As I was creating worksets in the HathiTrust that were specific to discrete childrenâ€™s magazine titles, I also downloaded entire issues and volumes that the HathiTrust had, saving the files on my hard drive, in the cloud, and in Zotero. When it came time to actually running the analyses, I was puzzled. I modified the stoplists, or words not to include in the analysis, to nix words like â€œtheâ€ and other frequently used words as well as personal names that appeared in the in topic models that Paper Machines kept spitting out. What I got was no clearer. I was still seeing mostly the use of personal names across years. After reading the Journal of Digital Humanities special issue on topic modeling, it became clear that while Paper Machines was beautiful and promised to deliver an illustrative visualization of topics over time, what it actually produced was not necessarily reliable. Adam Crymble, the critic, notes that Paper Machines is more for getting oneâ€™s feet wet with data visualization rather than creating robust analyses.⁶ Crymble attributes this to a lack of documentation and functionality to the fact that Paper Machines is an ad hoc tool developed by busy faculty members who arenâ€™t necessarily computer scientists. And, in my brief foray with the tool, there may be serious issues with the accuracy of pdf files that rely on hit-or-miss optical character recognition, or the ability of the computer to â€œreadâ€ the text verbatim as we would read the text. This frustration led me right back to the HathiTrust.

At this point, what I did was revisit the handful of secondary sources (literally a handful exist) on childrenâ€™s magazines in the US in the nineteenth century to re-identify those that ran the longest or had a notable circulation in relation to similar publications.⁷ I identified four childrenâ€™s magazines on which to run my analyses: Youthâ€™s Companion, Youthâ€™s Friend, Peter Parleyâ€™s which merged with Merryâ€™s Museum, and St. Nicholas. As a group, these magazines ran from 1821-1943. I then created a workset inclusive of these magazines and began running analyses.

Because the corpus I was analyzing includes 721 volumes, each analysis took 45 minutes to an hour. I ran several analyses identifying 10, 10, and 50 topics. I wanted to see how consistent the analyses were (hence the 10 and 10), and how granular they could get (with 50 topics). What I found was some gibberish and some very curious groupings.

Now, Iâ€™ve been writing about American Indian education history for several years. My current project focuses on colonization as an educative phenomenon over multiple generations. Given the detail and reception of Lummisâ€™ articles in the 1890s about Isleta Pueblo and the rapid movement of Euroamericans and African Americans west after the Civil War, I expected to see evidence of some sort of prominent discussion of colonization in the childrenâ€™s magazines I was analyzing. But I only could identify one topic that could be read with low inference as one that directly addressed colonization: topic 9. This was interesting. I went back to Andrew Goldstone and Ted Underwoodâ€™s piece on topic modeling in the Journal of Digital Humanities, and they had this nugget of a reminder: “By forcing us to attend to concrete linguistic practice, topic modeling gives us a chance to bracket our received assumptions about the connections between concepts.”⁸ What I was seeing was what I was seeing. Why was I seeing this? What does this suggest about the editing process and intent of childrenâ€™s magazines?

To respond to these questions, I returned to my ethnographic training. I closely examined each of the 50 topics that the HathiTrust analysis produced. I then began looking across topics for supercodesâ€”or supertopicsâ€”that could be identified with low inference. Then, I named them. The supertopic with the greatest number of topics I identified as Knowledge, Art, Learning. It included eight topics. The next two supertopics each had six topics: Nature, Landscapes, Animals and Patriotism, War, Legend. The subsequent supertopics each had four topics: Nuclear Family & Home, Manliness, Built Environment, and Time. Finally, the last collection of supertopics each had three topics: Advertisements + Membership, Technological Innovation + $$, and Transportation.

To check the accuracy in my coding, or naming, I returned to the initial two analyses I ran on the Childrenâ€™s Magazines workset, comparing my supertopics with the identified 10 topics produced in each of the first two analyses. Generally, my supertopics were consistent with these analyses. Had I just used the initial two analyses, I wouldnâ€™t have been able to make out the Built Environment or Time, nor would Knowledge, Art, Learning have been prominent. This suggests that the relationship between the number of documents analyzed and the number of topics produced can be set by the researcher to gauge the low inference nuance with which one can read a body of texts. It also suggests that the topics produced can serve as a type of validity check for the close reading of texts that historians are accustomed to doing.

What the topics and supertopics tell me, at this juncture, is that colonization was not framed as colonization. In childrenâ€™s magazines, it manifested in specific forms of learning, the recounting of patriotic or legendary conflicts, the assumption of the nuclear family, the built environment, transportation, and what constituted manliness. This might sound flip, but in studying federal policies toward American Indian communities in the nineteenth and early twentieth centuries, these very topics pervaded the Office of Indian Affairs schooling system and its curriculum. Outside the school, the narrative of manifest destiny has been an always already â€œfactâ€ in US history. And this narrative was crafted for and by Euroamericans for Euroamerican audiences. Colonization, in other words, would appear to be pervasively ambient. How could this be, given the often violent encounters that readers of these magazines must have experienced or heard about? And, what about discussions in adult literary magazines? Did they also background direct discussion of Euroamerican interactions with American Indian communities? What did this look like?

At this point, Iâ€™m finding that I need to do two things. First, I need to run similar analyses of prominent adult literary magazines to see what topics emerge. This might tell me whether or not there were parallels between the intended audiences. One of the underlying questions I know have, thanks to Don Warren and AJ Anguloâ€™s work on agnotology in education history, is whether or not discussion of colonization was actively masked for children who might well have experienced confrontation first hand. Second, Iâ€™m beginning to think about colonization and its associated policies differently. After conversations with colleagues in psychology and counseling, it seems worth examining trauma as a multigenerational phenomenon that has major implications for both policy formation and learning at the genetic level.⁹ This has been echoed experientially by my father-in-law, who was on the front lines in the Vietnam War. In a recent conversation I had with him, he remarked, â€œAfter ground combat, what is there to be afraid of?â€ This comment stopped me cold. Surely, he was not the first person to have this realization. What, then, are the educational implications of this? And, how might have this have manifested in popular literary magazines and policy in the nineteenth and early twentieth centuries?

^{1. R. Gordon Kelly, Childrenâ€™s Periodicals of the United States, Historical Guides to the Worldâ€™s Periodicals and Newspapers (Westport, CT: Greenwood Press, 1984), 378.↩}

^{2. See also John Gram, Education at the Edge of Empire: Negotiating Pueblo Identity in New Mexicoâ€™s Indian Boarding Schools (Seattle, WA: University of Washington Press, 2015).↩}

^{3. Megan R. Brett, â€œTopic Modeling: A Basic Introduction,â€ Journal of Digital Humanities 2, no. 1 (2012), http://journalofdigitalhumanities.org/2-1/topic-modeling-a-basic-introduction-by-megan-r-brett/.↩}

^{4. See also Miriam Posner, â€œVery Basic Strategies for Interpreting Results from the Topic Modeling Tool | Miriam Posnerâ€™s Blog,â€ October 29, 2012, http://miriamposner.com/blog/very-basic-strategies-for-interpreting-results-from-the-topic-modeling-tool/. Ted Underwood, â€œTopic Modeling Made Just Simple Enough.,â€ The Stone and the Shell, accessed July 14, 2015, . Scott Weingart, â€œTopic Modeling for Humanists: A Guided Tour,â€ The Scottbot Irregular, accessed October 21, 2015, .

.↩}

^{5. Chris Johnson-Roberson and Jo Guldi, Paper Machines | Visualize Your Zotero Collections, 2012, http://papermachines.org/.↩}

^{6. Adam Crymble, â€œReview of Paper Machines, Produced by Chris Johnson-Roberson and Jo Guldi,â€ Journal of Digital Humanities 2, no. 1 (April 4, 2013): 77â€“80, http://journalofdigitalhumanities.org/2-1/review-papermachines-by-adam-crymble/.

↩}

^{7. Mabel F. Altstetter, â€œEarly American Magazines for Children,â€ Peabody Journal of Education 19, no. 3 (1941): 131â€“36. M. O. Grenby, â€œThe Origins of Childrenâ€™s Literature,â€ in The Cambridge Companion to Childrenâ€™s Literature, ed. M. O. Grenby and Andrea Immel (Cambridge: Cambridge University Press), 3â€“18, accessed March 16, 2012. Hunt, Peter. Childrenâ€™s Literature. Blackwell Guides to Literature. Oxford, UKâ€¯; Malden, Mass: Blackwell Publishers, 2001.

↩}

^{8. R. Gordon Kelly, Mother Was a Lady: Self and Society in Selected American Childrenâ€™s Periodicals, 1865-1890, Contributions in American Studies, No. 12 (Westport, Conn: Greenwood Press, 1974). Childrenâ€™s Periodicals of the United States, Historical Guides to the Worldâ€™s Periodicals and Newspapers (Westport, Conn: Greenwood Press, 1984). Betty Longenecker Lyon, â€œA History of Childrenâ€™s Secular Magazines Published in the United States from 1789 to 1899â€ (Ph.D., The Johns Hopkins University, 1942).↩}

^{9. See, for example, Rachel Yehuda et al., â€œHolocaust Exposure Induced Intergenerational Effects on FKBP5 Methylation,â€ Biological Psychiatry, 2015, http://www.sobp.org/journal. ↩}

Walking Through Big Data: One Historian’s Path | History of Education Society, 6 November 2015, St. Louis, MO by Adrea Lawrence is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.