Further thoughts on collectivity

January 27, 2021

On Monday, I was privileged to give a talk as part of the US Latino Digital Humanities speaker series at the University of Houston, titled “Building Collectivity in Digital Humanities Through Working With Data.” You can watch a recording of the talk via the Recovering the US Hispanic Literary Heritage YouTube channel. I am very honored to have kicked off their 2021 set of talks, and grateful to everyone who engaged, in the video, or via Twitter.

In the talk, I argue that we’ve tended to see DH from an additive standpoint, focusing heavily on individual achievement, and driven to produce and acquire more and more – and I suggest that there is room to think of DH more collectively. During the talk, and after, I was really lucky to have it tweeted or live-tweeted by Quinn Dombrowski, Amanda Visconti, and Brandon Walsh, and I’m linking to Quinn and Brandon’s Twitter threads as short summaries.

I want to come back to one of the questions that was asked shortly after the talk, and I’m answering it with a blog post because … it might otherwise make for a long Twitter thread. Frederike Neuber wrote:

“In daily work, however, documentation is unfortunately often considered the “least important part” when visible and functional results are expected (under time pressure). In the end, is it then better to publish something without documentation or should we not publish at all?”

This is an excellent question; and I’m delighted that Neuber asked it. There’s a lot in it to unpack, and I think it’s also a good opportunity for me to expand even further on the idea of what I mean by “collective.”

[Read full post]

Just so it’s clear, I want to say a bit about what “documentation” might mean, in the context of a digital project, or any digital artifact or project component. It might be about many things: the provenance of a corpus, the compatibility of files, the choices that influenced materials’ creation, or even providing guidance that explains how to navigate a particular website. It could be lengthy, or a couple of paragraphs.

The phenomenon that Neuber is describing isn’t new to me at all. I suspect it’s familiar to many of us who’ve worked on digital projects, whether individually, or as part of a team (though I think that this is particularly endemic to team-based projects.)

When someone publishes a scholarly essay, or a book, we expect to find a bibliography. If there were no Works Cited list, many of us would assume some sort of error in the publishing process. Most people would not say “oh, I guess they just didn’t have time to make a bibliography,” and most people, I think, would look askance at a note that said “Bibliography: coming soon!”

Including a bibliography with a scholarly work is a norm that we’ve bought into, as a community. Including documentation is not. One of the points I made in Monday’s talk is that in developing new programs, new processes, we do things over and over again, in the hope that by doing so, we’ll get used to the new processes, and they’ll become sustainable. That’s one possibility. But we need to ask ourselves: by repeating this process, are we making it sustainable? Or are we just normalizing overwork? Or, to come back to Neuber’s question, are we normalizing the idea that documentation is an afterthought, that we can let go?

I want to come back to Max Liboiron’s question of what an ideal relationship between author and reader might look like. Imagine the author as the team behind a project, and the project as the thing that’s creating the relation between them. How might that relationship play out differently, with a project that includes documentation, vs. a project that does not? What is possible, and less possible, in each case? I encourage you to pause, here, and take time to think about how you would answer this question, because I do not assume that your answers will be the same as mine.


When I answer this question, I think that with a project without documentation can be cited, can be pointed to as an example, can even be explored – but much depends on the reader, and their own prior knowledge, to determine whether the project will be legible as a research question to be engaged with, or, for example, a set of visually interesting and clearly complex network visualizations. I think of this distinction as asking can a reader go beyond responding to a project by exclaiming “whoa, that’s cool!” Which readers can go beyond that question? And how far beyond “whoa, that’s cool!” can readers go? Can they start thinking about the research question in more detail, based on what documentation or apparatus has been provided by the project team? Can they start thinking about what has been included, or excluded? Could they discuss the project as a scholarly argument; or as part of their own scholarly argument? At the end of this post, I added a short concrete example, featuring the project Six Degrees of Francis Bacon. If you want to go read that, and then come back up here, go ahead. Or you can just keep on reading, and look at the example in the end.

All of these questions shape what kind of relationship is possible between author and reader. And I would argue that documentation makes a massive difference in whether these questions can be answered, and how. In many cases, without documentation, they might be answerable and explorable by readers who have both intermediate to advanced technical knowledge, and intermediate to advanced knowledge of the subject area. But I doubt that most people building DH projects would say “our target audience is readers/users with advanced knowledge.”

So, we should ask ourselves: why doesn’t documentation get created? Depending on your context, the answers might look like any of the following, in some combination:

“We didn’t plan for it.”

“We didn’t plan for it, because our focus was on the computational analysis part of the project.” For “computational analysis part,” substitute “rendering the graphical user interface,” or “pulling and storing data from different sources,” etc.

“We’re just really overloaded, and we can’t overtax our staff further, and documentation was the part we could sacrifice.”

“We don’t have the knowledge to create documentation; that’s more the purview of the researcher.”

“The documentation isn’t interesting, from a research perspective. It’s an extra bonus, if we can get to it, but it’s not a priority.”

“Ah, yeah, we’re focusing on other projects right now, but the documentation is on our slate for next month.”

All of the above are realistic answers, in my experience. And they come from genuinely difficult places. I’m not the first person to note that one of the areas of conflict between digital humanities and traditional humanities publishing is that printed book projects have a major endpoint at the point of publication. They don’t need updates, or ongoing maintenance; and libraries and centers who work with researchers to build these projects are still grappling with the problem that success (meaning: many people wanting to work with them to build projects) would be thoroughly unsustainable, from a labor perspective.

So, we should ask ourselves, why isn’t documentation considered a priority? My bluntest answer is that in the same way that teaching is considered secondary to research, engaging with project readers/users is considered secondary to the work that goes into developing a particular algorithm, or set of algorithms, or other computational analysis method. In the rankings of what counts as scholarly achievement, it just doesn’t matter. If people are smart enough, they’ll figure it out without the documentation. And the deserving scholars will all find their way to tenure-track jobs, too.

A more subtle answer is that providing documentation feels less important because much as it’s nice to have citations, the conventions around both technology and humanities research emphasize innovation in both overt and subtle ways, i.e., I’m doing something brand new – and the idea of documentation is that someone else would be using your material – but that might mean they’d use your material in ways differently than you’d anticipated, and gosh, that might get weird, and also, is anyone really going to use my work that way, since it might imply that their own work wasn’t sufficiently original? I can put this more succinctly: the relationship between author and reader that good documentation facilitates is less one of author and reader – it’s one that’s transformative, ending as something closer to co-authorship – and that still feels like new and uncertain territory, in the humanities, Though I feel optimistic that this could change; is changing, in fact; and it’s just that the change still takes time. and that the fact that it isn’t the norm influences how we plan and execute our projects and their components.

This brings me to the question of what I mean by collectivity, or as another person asked this morning on Twitter, “how is reusing data different from citation; that is, how is reusing data collective?”

My answer to this is that you could cite a dataset – and comment, in an essay, about the categories that were included, or left out; or how many items are included in the dataset if you’re presenting it as evidence, or commenting on what sources it’s drawing from, etc. This is valid; lots of scholars might do this; there’s nothing wrong with it.

But if you’re using someone else’s dataset – meaning that you’re building a project on top of it, or combining it with another dataset that you’ve created, or expanding its categories – if you’re doing that, and doing it well –then that’s going to imply, and quite likely, require that you do more than just upload the dataset into your project. You’re going to need to spend time thinking carefully and critically about the scholarly assumptions and assertions that went into the dataset’s creation; and you’re probably going to engage with them in some way that involves you making scholarly arguments of your own.

I would describe that level of engagement as sufficiently distinct from simply citing a dataset, or even just uploading it, that I would call it collective.

To the same degree, I would distinguish between just tossing your dataset into GitHub, or Figshare; and between providing greater context and guidance – so that the critical and intellectual choices you made in producing that dataset are explicitly clear, rather than tacitly evident, or evident to sufficiently advanced readers. And of course, this isn’t just about datasets. Matt Lincoln’s Twitter thread, which I quoted the first tweet from in my talk, is a perfect example of the choices that come up when you’re thinking about releasing material and project components. Matt describes the thought process that he and folks at CMU went through on a couple of different projects – for which some of the components were readily reusable, others weren’t. For the Index of DH Conferences project, as he notes, the code itself is likely too customized – but the data models that were developed could be. I’m highlighting Matt’s example, not just because he explains it well, but because it demonstrates how much the thinking and choices will vary, from case to case.

I call this level of effort collective because I say that whether you are making use of someone else’s material, or whether you are presenting material for reuse, the implication is that you are deliberately and carefully considering the needs, choices, and perspectives that others have brought, or may bring – and deliberately interacting with those. I think that the level of engagement that’s required here, and our lack of reward for it, is a factor in the fact that there haven’t been oodles and oodles of DH project reviews of the sort that are being published, or are in the works, at Reviews in DH, RIDE, and the journal Textual Cultures. I’m distinguishing it from citation, which is important, but can often be more transactional: I used your ideas, I gave you credit, we’re done.

It’s important to say here: collectivity can take a lot of work. I don’t have a standard for perfect collectivity, and I think that the effort to make everything we do maximally collective would start out with a massive case of vocational awe on a swift trajectory towards collapse. And indeed, when I originally threaded about this back in December, Scott Weingart wondered about something similar. The expectations and narratives around our labor (in libraries, in departments, in corporate settings (where documentation is also a challenge!)) don’t encourage this sort of collectivity; and we don’t get collectivity by sacrificing ourselves in pursuit of it. I also wouldn’t want us to abandon a common practice in digital humanities, of showing or publishing a work in progress early on, warts and all, in order to get feedback, just because it didn’t have full documentation.

So: where does that leave us, in terms of the original question? “…Is it then better to publish something without documentation or should we not publish at all?”

Part of this comes down to what we want. Liboiron’s essay asks readers to think about their ideal reading exchange between author and reader – and as creators, too, this is a question that we should give serious thought to – with and without the adjective “ideal,” if that’s helpful. What kind of interactions do we want to be possible? If we imagine from the start that one of our project goals is for the project to be reusable by others, how does that change how we define what the work of the project is, and how we plan for it? And while some projects include the option for users to be contributors, transcribing or otherwise adding info, that’s really only one type of relationship, and I don’t think it should be thought of as the natural default answer.

If the point of Monday’s talk, or this blog post were “don’t publish unless you can make it perfect,” then … well, the talk would have been a lot shorter, and this post wouldn’t need to be written.

There is no “just do this one thing, or even these five things, and then your project will be collective, not additive.” Even trying to classify projects, and say, “oh, well, this one is additive, and that one is collective” would be an oversimplification.

Nor is this something that we should be looking entirely to grant funding organizations like Mellon, or Sloan, or the NEH, to be solving for us; or “changing the game,” so that this starts to happen more frequently. This implicates everyone: from researchers who conceive of projects, to researchers who evaluate them, to technical teams and project managers. And this won’t apply to all projects in the same way, depending on what they’re doing technically. If you really pushed me to tell you what action to take, I might say “what if you made the project scope half as big, in order to focus more on the documentation, and preparing the project material for reuse?” But that’s only one possible answer.

The answers to this problem won’t come just from me, or just from any one person, or group. But the reason that I’ve raised the question of additive and collective has to do with normalization. Some problems are so ubiquitous that they begin to seem not only unsolvable, but unworthy of discussion. People say “oh, you’re just complaining about that again; it’s a problem; we know; let’s move on.” I don’t want this aspect digital scholarship to become like that – because if it becomes so difficult that we can’t even discuss it, then I know we’ll have lost. And indeed, responses to Monday’s talk demonstrated that people are working on this, with projects like The Digital Documentation Process, which I haven’t even had a chance to dig into yet. So, I say all this – not because I think we’re faced with questions like “should we not publish at all, if we can’t publish documentation?” – but because I think there are so many possibilities that we could be considering.

Epilogue: a concrete example

As a concrete example, consider the project Six Degrees of Francis Bacon. I want to make it clear: I’m using Six Degrees because I think it’s an important project, and because it has changed and developed further over time, which is a significant achievement. I also want to be explicit about the fact that Six Degrees is operating under the same pressures as the rest of us to be additive, and make more things – so, in using it as an example, I am explicitly not saying ‘why haven’t they done better?’)

The project, in its current version, has documentation that helps people understand what the project, and its network visualizations, are showing. It hasn’t always – here’s an earlier version, from 2016, via the Wayback Machine.

I’ve used Six Degrees of Francis Bacon in DH workshops for years, as part of an activity where small groups would spend some time looking at a project, and discussing what it was trying to do, what research questions were driving it, what questions they had, as scholars, or “readers,” of the project. And consistently, in those early years, I would encounter students, even early modernists, whose reaction to the project was “whoa, cool! … but I’m not sure how to use this, or what to do with it, beyond clicking around.” (In some cases, this was because they’d skipped past a video intro that loaded on introduction, and they couldn’t figure out how to get the video to play again.) It would be oversimplifying to say that adding documentation makes Six Degrees collective – but I’d certainly call it a move towards more collective thinking. Later on, Six Degrees also made their data downloadable, and licensed for reuse, via the Folger Digital Collections site. That they’ve included an explicit license, and also some explanatory documentation about what the zip file contains, is also a move towards more collectivity.

Six Degrees could go even further. If you download the data, you might open the table of relationship types. And in perusing the list, you would see that the types include “Acquaintance of,” “Friend of,” and “Colleague of” – as well as “Rival of,” and “Enemy of.” If you were going to use the Six Degrees dataset in your own project, then you would probably need to have a better idea of what makes someone a rival, vs. an enemy; or when someone is a friend, vs. an acquaintance, in early modern terms. Or maybe, you would be trying to use not the whole dataset, but their taxonomy of relationships, for a project with a slightly different timespan. More information could help you better understand whether, and how you might want to use the same descriptors for relationship types – or whether you want to argue that a new label is needed. And when you published your own project, if you used Six Degrees’ data, you could choose to simply cite the dataset – or, you could choose to explain how you had modified it, and why.

I want to note that the progress that the project team has made over time is due in no small part to an NEH Digital Humanities Advancement Grant, which allowed the team to consider the project after its initial phases, and think about which directions they wanted to go in. If you came here midway through reading, click here to get back to the rest of the post

Further thoughts on collectivity - January 27, 2021 - Paige Morgan