Skip to content

Walking Through Big Data: One Historian’s Path | History of Education Society, 6 November 2015, St. Louis, MO

Historians of education are known for their intensive, close examinations of documentary evidence.  Through careful “document analysis,” one identifies patterns, anomalies, curious players, key events, policy, and change over time all in that which is deemed to be “educational.”  Of late, the trend toward publishing what might be called case studies or microhistories has manifested.  This includes my own work on American Indian education.  In the broader fields of history and literary criticism tools for analyzing large bodies of textual data have become prominent in discussions about the digital humanities.  Our tiny subfield has just a handful of people who have been experimenting with such tools.  Today, I will walk you along my meandering path in an attempt to understand the terrain of 19th century children’s magazines in terms of what they published and, by extension, what their publishers hoped children and families would learn.

I began along this path as I was finishing research for my book Lessons from an Indian Day School several years ago.  As I did a scattershot search in the serial set for anything related to Pueblo Indians, I came across several articles that recounted histories from a couple of Pueblo Indian communities.  Charles Lummis, the notable—and in the BIA’s eyes, notorious—adventurer, writer, librarian, and editor, published a number of these stories in St. Nicholas Magazine. This magazine, in fact, had one of the longest runs of any children’s periodical in the US, from 1873-1939; and, it had a significant average monthly circulation at 70,000 copies distributed.1  I tabled these stories for a while.

When I went back to them, I began to read each story deeply, and I was captivated by the ethnographic depth with which Lummis recounted the stories and the scenes in which they were told to him when he lived at Isleta Pueblo, not far from Albuquerque, from 1890-1892.  Screen Shot 2015-11-30 at 6.57.46 PM
I paid close attention to the coyote stories, as Coyote among most, if not all, Pueblo Indian communities is a trickster, and thus a teacher.  I went and studied Lummis’ papers, looking at many of his records from the years surrounding his time at Isleta and his subsequent work with Adolph Bandolier.  I learned that Lummis was devoted to the Isleta community and had worked against the BIA to aid Isletan parents in their right to determine the course of formal education for their children.2  I also learned that the larger questions I had about colonization as an educative phenomenon had reached their limits with the method of exegesis.

To examine how colonization might have been instructive (or not) to those who experienced it in the American West, I needed a broader approach.  I needed to look at a much wider and deeper body of evidence.  And, I needed to be prepared for what I might or might not find. I had found the Lummis articles in the HathiTrust repository. I had downloaded each page individually in order to look at what he had written as an institutional subscription was necessary to download a year’s worth of issues at a time. I felt like I had walked myself right into a rut that was ten years deep. And then, the HathiTrust opened up the HathiTrust Research Center, which allows researchers to create worksets of materials in the repository and run a number of algorithms to analyze the textual data. My rut was flattened.Screen Shot 2015-11-30 at 7.01.32 PM

What shape would my worksets take? In playing around with the tools for researchers, I found that I could identify volumes and issues of periodicals in the HathiTrust and cross-reference those with the Digital Public Library of America and WorldCat. I found, for example, the HathiTrust has nearly the full run of St. Nicholas Magazine, but that it has only a portion of the Youth’s Companion. As I began working my way down the list of prominent children’s magazines in the nineteenth century—those that ran the longest or had a significant circulation—I compiled worksets and ran what seemed to be initially elementary analyses of each: deployable word counts.

As Lummis’s articles piqued my interest in how non-Natives recounted significant tribal histories with a wealth of information about their environments and natural history, I hoped that there would be many such examples of this co-optation. Screen Shot 2015-11-30 at 7.03.40 PMAnd, I had hoped that I would be able to identify patterns in what I was seeing. So, I requested initially short lists, beginning with the top 2000 words for a workset (or magazine collection). I found that I had to edge up the word counts to 7000 to ensure that “Indian,” “Indians,” “savage,” “native,” and the like actually appeared in the word counts. Just with these word counts, I found that the Youth’s Companion had a much greater frequency of such words than St. Nicholas Magazine and Peter Parley’s and Merry’s Museum magazines. Why was this the case, and would topic models show something that supported or refuted this seeming buried indicator of interactions with American Indian communities in what is now the continental US?

And what, by the way, is topic modeling? Megan Brett, who published an introduction to topic modeling in the Journal of Digital Humanities in 2012, writes, “Topic modeling is a form of text mining, a way of identifying patterns in a corpus. You take your corpus and run it through a tool which groups words across the corpus into ‘topics’.”3 In other words, a “topic” is a group of words that appears repeatedly over a constellation of documents.4 A software program called MALLET, which was developed by Andrew McCallum at the University of Massachusetts-Amherst, is the tool most humanists engaging in topic modeling use, including the HathiTrust. Though one can download MALLET for free through U-Mass Amherst, it’s not intuitive to use. There is an interface called the Topic Modeling Tool that one can download from GitHub to make MALLET more usable. Or, you can try out a couple of other tools like the HathiTrust’s internal algorithm for topic modeling that will run your worksets as you see fit. Or, there is Paper Machines,5 a Zotero plug-in that spits out beautiful stream graphs. It was important that I find a tool that I can use with a relatively intuitive interface for a couple of reasons: 1) I don’t code or understand the inner workings of the backend software my machine runs on, 2) I’m new to topic modeling, so I’m most interested in understanding how the process works and how it can help me most efficiently direct my energies in reading texts—or their reliefs—closely and contextually.

Initially I began tinkering with Paper Machines. I have used Zotero for a number of years, and I had, or could get, loads of pdfs. As I was creating worksets in the HathiTrust that were specific to discrete children’s magazine titles, I also downloaded entire issues and volumes that the HathiTrust had, saving the files on my hard drive, in the cloud, and in Zotero. When it came time to actually running the analyses, I was puzzled.Screen Shot 2015-11-30 at 7.17.47 PM I modified the stoplists, or words not to include in the analysis, to nix words like “the” and other frequently used words as well as personal names that appeared in the in topic models that Paper Machines kept spitting out.Screen Shot 2015-11-30 at 7.18.55 PM What I got was no clearer. I was still seeing mostly the use of personal names across years. After reading the Journal of Digital Humanities special issue on topic modeling, it became clear that while Paper Machines was beautiful and promised to deliver an illustrative visualization of topics over time, what it actually produced was not necessarily reliable. Adam Crymble, the critic, notes that Paper Machines is more for getting one’s feet wet with data visualization rather than creating robust analyses.6 Crymble attributes this to a lack of documentation and functionality to the fact that Paper Machines is an ad hoc tool developed by busy faculty members who aren’t necessarily computer scientists. And, in my brief foray with the tool, there may be serious issues with the accuracy of pdf files that rely on hit-or-miss optical character recognition, or the ability of the computer to “read” the text verbatim as we would read the text. This frustration led me right back to the HathiTrust.

At this point, what I did was revisit the handful of secondary sources (literally a handful exist) on children’s magazines in the US in the nineteenth century to re-identify those that ran the longest or had a notable circulation in relation to similar publications.7 Screen Shot 2015-11-30 at 7.23.43 PM I identified four children’s magazines on which to run my analyses: Youth’s Companion, Youth’s Friend, Peter Parley’s which merged with Merry’s Museum, and St. Nicholas. As a group, these magazines ran from 1821-1943. I then created a workset inclusive of these magazines and began running analyses.

Because the corpus I was analyzing includes 721 volumes, each analysis took 45 minutes to an hour. I ran several analyses identifying 10, 10, and 50 topics. I wanted to see how consistent the analyses were (hence the 10 and 10), and how granular they could get (with 50 topics). What I found was some gibberish and some very curious groupings.Screen Shot 2015-11-30 at 7.26.33 PM

Now, I’ve been writing about American Indian education history for several years. My current project focuses on colonization as an educative phenomenon over multiple generations. Given the detail and reception of Lummis’ articles in the 1890s about Isleta Pueblo and the rapid movement of Euroamericans and African Americans west after the Civil War, I expected to see evidence of some sort of prominent discussion of colonization in the children’s magazines I was analyzing. But I only could identify one topic that could be read with low inference as one that directly addressed colonization: Screen Shot 2015-11-30 at 7.27.37 PMtopic 9. This was interesting. I went back to Andrew Goldstone and Ted Underwood’s piece on topic modeling in the Journal of Digital Humanities, and they had this nugget of a reminder: “By forcing us to attend to concrete linguistic practice, topic modeling gives us a chance to bracket our received assumptions about the connections between concepts.”8 What I was seeing was what I was seeing. Why was I seeing this? Screen Shot 2015-11-30 at 7.26.33 PMWhat does this suggest about the editing process and intent of children’s magazines?

To respond to these questions, I returned to my ethnographic training. I closely examined each of the 50 topics that the HathiTrust analysis produced. I then began looking across topics for supercodes—or supertopics—that could be identified with low inference. Then, I named them. The supertopic with the greatest number of topics I identified as Knowledge, Art, Learning. It included eight topics. Screen Shot 2015-11-30 at 8.08.55 PM The next two supertopics each had six topics: Nature, Landscapes, Animals and Patriotism, War, Legend. The subsequent supertopics each had four topics: Nuclear Family & Home, Manliness, Built Environment, and Time. Finally, the last collection of supertopics each had three topics: Advertisements + Membership, Technological Innovation + $$, and Transportation.

To check the accuracy in my coding, or naming, I returned to the initial two analyses I ran on the Children’s Magazines workset, comparing my supertopics with the identified 10 topics produced in each of the first two analyses. Generally, my supertopics were consistent with these analyses. Had I just used the initial two analyses, I wouldn’t have been able to make out the Built Environment or Time, nor would Knowledge, Art, Learning have been prominent. This suggests that the relationship between the number of documents analyzed and the number of topics produced can be set by the researcher to gauge the low inference nuance with which one can read a body of texts. It also suggests that the topics produced can serve as a type of validity check for the close reading of texts that historians are accustomed to doing.

What the topics and supertopics tell me, at this juncture, is that colonization was not framed as colonization. In children’s magazines, it manifested in specific forms of learning, the recounting of patriotic or legendary conflicts, the assumption of the nuclear family, the built environment, transportation, and what constituted manliness. This might sound flip, but in studying federal policies toward American Indian communities in the nineteenth and early twentieth centuries, these very topics pervaded the Office of Indian Affairs schooling system and its curriculum. Outside the school, the narrative of manifest destiny has been an always already “fact” in US history. And this narrative was crafted for and by Euroamericans for Euroamerican audiences. Colonization, in other words, would appear to be pervasively ambient. How could this be, given the often violent encounters that readers of these magazines must have experienced or heard about? And, what about discussions in adult literary magazines? Did they also background direct discussion of Euroamerican interactions with American Indian communities? What did this look like?

At this point, I’m finding that I need to do two things. First, I need to run similar analyses of prominent adult literary magazines to see what topics emerge.Screen Shot 2015-11-30 at 8.21.25 PM This might tell me whether or not there were parallels between the intended audiences. One of the underlying questions I know have, thanks to Don Warren and AJ Angulo’s work on agnotology in education history, is whether or not discussion of colonization was actively masked for children who might well have experienced confrontation first hand. Second, I’m beginning to think about colonization and its associated policies differently. After conversations with colleagues in psychology and counseling, it seems worth examining trauma as a multigenerational phenomenon that has major implications for both policy formation and learning at the genetic level.9 This has been echoed experientially by my father-in-law, who was on the front lines in the Vietnam War. In a recent conversation I had with him, he remarked, “After ground combat, what is there to be afraid of?” This comment stopped me cold. Surely, he was not the first person to have this realization. What, then, are the educational implications of this? And, how might have this have manifested in popular literary magazines and policy in the nineteenth and early twentieth centuries?

1. R. Gordon Kelly, Children’s Periodicals of the United States, Historical Guides to the World’s Periodicals and Newspapers (Westport, CT: Greenwood Press, 1984), 378.

2. See also John Gram, Education at the Edge of Empire: Negotiating Pueblo Identity in New Mexico’s Indian Boarding Schools (Seattle, WA: University of Washington Press, 2015).

3. Megan R. Brett, “Topic Modeling: A Basic Introduction,” Journal of Digital Humanities 2, no. 1 (2012),

4. See also Miriam Posner, “Very Basic Strategies for Interpreting Results from the Topic Modeling Tool | Miriam Posner’s Blog,” October 29, 2012, Ted Underwood, “Topic Modeling Made Just Simple Enough.,” The Stone and the Shell, accessed July 14, 2015, . Scott Weingart, “Topic Modeling for Humanists: A Guided Tour,” The Scottbot Irregular, accessed October 21, 2015, .

5. Chris Johnson-Roberson and Jo Guldi, Paper Machines | Visualize Your Zotero Collections, 2012,

6. Adam Crymble, “Review of Paper Machines, Produced by Chris Johnson-Roberson and Jo Guldi,” Journal of Digital Humanities 2, no. 1 (April 4, 2013): 77–80,

7. Mabel F. Altstetter, “Early American Magazines for Children,” Peabody Journal of Education 19, no. 3 (1941): 131–36. M. O. Grenby, “The Origins of Children’s Literature,” in The Cambridge Companion to Children’s Literature, ed. M. O. Grenby and Andrea Immel (Cambridge: Cambridge University Press), 3–18, accessed March 16, 2012. Hunt, Peter. Children’s Literature. Blackwell Guides to Literature. Oxford, UK ; Malden, Mass: Blackwell Publishers, 2001.

8. R. Gordon Kelly, Mother Was a Lady: Self and Society in Selected American Children’s Periodicals, 1865-1890, Contributions in American Studies, No. 12 (Westport, Conn: Greenwood Press, 1974). Children’s Periodicals of the United States, Historical Guides to the World’s Periodicals and Newspapers (Westport, Conn: Greenwood Press, 1984). Betty Longenecker Lyon, “A History of Children’s Secular Magazines Published in the United States from 1789 to 1899” (Ph.D., The Johns Hopkins University, 1942).

9. See, for example, Rachel Yehuda et al., “Holocaust Exposure Induced Intergenerational Effects on FKBP5 Methylation,” Biological Psychiatry, 2015,

CC BY-NC-ND 4.0 Walking Through Big Data: One Historian’s Path | History of Education Society, 6 November 2015, St. Louis, MO by Adrea Lawrence is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Comments are closed, but trackbacks and pingbacks are open.