Open Modernisms, Part IV

Dec 15, 2014

Shawna

Comment

In this five-part series of blog posts, I am recounting, by way of documentation, the process by which the modernist community has begun to create a free, digital anthology of modernism. Parts IV and V summarize the MODSOURCE email list in November-December 2014; this installment summarizes the major debates, while the next and final installment will focus on the technical details.

On November 14, Claire Battershill, Chris Forster, Andrew Pilsch, and I unleashed upon the Open Modernisms “board” a summary of the discussions we had that wove together various threads from an MSA email list exchange and from a meeting at the MSA conference about creating a new, open anthology of modernist source texts. As we debated back and forth in about five dozen emails, a few watchwords emerged even as we had speecific disagreements about our methods: transparency, sustainability, simplicity, documentary (as in, we are aiming at creating “documentary editions” rather than “critical” or “variorum” editions.) Much of what follows recapitulates questions and themes in the previous three blog posts on Open Modernisms, but with each turn of the screw came a slightly new conceptual purchase on the project that’s worth documenting.

Definitions

Each debate, at some point, returned to the definition of the project. During these exchanges, some really salient, concise, and/or lovely phrases emerged:

Matt Huculak: “a good modernist anthology that one can use in the classroom”
Matt: “a resource for our colleagues where they can go to one place, choose from a variety of primary source materials, and allow them to build a course reader for free, that can be used in the classroom for free, etc”
Chris: “a project where the value created is a function of scholarship”
Chris: “re-editing key modernist texts for a digital frame of reference”
Chris: “that set of manifestoes and primary secondary documents (i.e. essentially criticism) that the Rainey anthology focuses on”
Claire: approaching a “single anthology to cover the pedagogical needs of the MSA membership”
Claire: “less of a fixed table of contents and more of a repository of works that grows over time”
Claire: “a large, intensely bibliographic sort of creature”

These statements embed crucial criteria (beyond feasibility) that began to govern our decision-making, providing a pathway for navigating the discussions below. They were necessary because our mission “to make a new anthology of modernist source texts” is a far more slippery ideal than it might initially appear.

For example, Claire’s argument that “checking the list against existing anthologies’ tables of contents will be important to make sure we cover our bases but also that we go beyond what is already available in a neat package” was met with Chris’s exhortation that we still make our own idiosyncratic table of contents. Doing so will not only to meet our primary objective of creating a twenty-first century anthology but also to avoid the unsavory implication that we basically swiped Rainey’s intellectual property. After all, Chris pointed out, the editorial “process of selection represents effort and labor” that we should respect. I mentioned that shifting which parts of a given text, or which text to use from a given author, may also help reduce any “skeeve factor.” James Gifford provided a neat solution by proposing we simply “clarify that we’re both providing a redundancy to [other anthologies’] contents in another format /and/ opening other options as well as correcting troubles.” Stephen Ross pointed out that “[i]t’s not as though the texts we’re using are unique in other editors’ choosing,” concluded quite reasonably,

overlap is inevitable and being guided by others’ previous choices in these matters is only responsible. Hat-tips are free, and good practice, though, so by all means let’s indicate clearly that we’ve been guided in some of our choices by other anthologies. But let’s also not forget that the whole project emerges from a sense that those anthologies are deficient in some way(s).

This exchange in many ways characterizes the spirit and structure of the debates we had about Open Modernisms: a statement met initially with approval would be found wanting in a certain way, and when we discovered the reasons behind the nit-picking, a reasonable compromise was struck (often not by “fixing” the initial problem aired but by addressing the actual concern in a more appropriate context).

Simplicity versus Long-range planning

Before these definitional statments emerged, Matt’s exhortation to simplify kicked off the board’s exchanges to the email generated by Chris, Claire, Andrew, and myself. Chris countered this exhortation with a McGannian injunction that we should be “re-editing the textual inheritance,” and that bringing simplicity to its logical conclusion (or extreme?) would not satisfy McGann’s injunction:

Caring about the minutiae of their representation, providing them with authoritative provenance and bibliographical richness and specificity, and (here, I know, I am out on a limb) enriching them with metadata and annotations of various kinds—that is what makes a project like this worthy of our time.

Philosophically, then, Chris and Matt (the latter supported by Stephen, the former supported by myself) disagreed heavily, but Claire Battershill and Alex Christie remarked that practically this disagreement does not automatically or necessarily jam the works. Claire pointed out that we might see “some benefit” if we take “multiple approaches to solving the same problems.” Matt himself suggested to “run both streams (edition building and PDF making) simultaneously,” and James Gifford added, “[M]essy first, perfected seconds, or a plural set are all just fine to my mind, though there’s often a convenience to firsts.” Flexibility is, indeed, perhaps the best way to ensure continued collaboration among colleagues (rather than, say, the hierarchical allowances of assigning one’s students to do the labor!)

I still maintained the rightness of Chris’s comment that, if we have little editorial oversight over the texts, “why even bother with OCR?” We might indeed slap up searchable images in PDF form at a quick rate with very little work, but what is the scholarly value in that? Many of the texts we wish to make available through this digital anthology are in some way available in various scanned PDF forms right now, if the instructor only searches for them hard enough. Individual professors at present either do, or do not, choose to cobble these links together (praying that the PDFs don’t disappear the week before students are due to use them!) I agree emphatically that it is not worth our time to provide a service that would add no more value than a simple, centralized clearinghouse of hyperlinks to PDFs hosted elsewhere would. Saving someone time on the Google search bar is a nice service, but we want to provide something greater than a time-saving convenience—at least in the long run.

What are the tiers?

Matt, citing a student of his who is creating a digital edition of Gertrude Stein’s “Composition as Explanation” by converting page images into a searchable PDF, clarified his original three-tier proposition. Tier one would be a searchable PDF image, useful both as an immediate source to offer teachers of modernism and as a “top-level document” that will serve as the basis for further editions and also serve as proof of our scrupulous attention to copyright. The second tier (as I am understanding Matt’s proposal) requires posting those PDFs and making them available for custom coursepack production. The third tier would involve “deep editing” (textual encoding and metadata).

There was some pushback to Matt’s suggestion. Claire asked for clarification about whether he meant “asking for some brand new digitization,” as Chris pointed out that the problem is “their ownership of the images.” We could not use scans obtained from third parties if we followed that particular three-tier process to the letter. Internally using third-party scans as the basis for OCR/creating texts that we’ve processed through our own labor is not a problem, but the scanned images of texts can indeed be copyrighted even for a text out of copyright.

James provided an alternative set of steps to the 6-part workflow that I had written (see Part III of this blog post series) and to the 3-tier one that Matt offered (summarized above). James’s reconsideration of the process specified that we 1) digitize texts to be 2) turned into various image formats (PDF, TIFF, JPG) that will be 3) OCR’d. At this point, a “messy PDF” of uncorrected OCR will be made available. Our job continues, however, after the “messy PDF,” as 4) we process the uncorrected OCR into corrected and encoded text (whether as simple as Markdown or as a complex as TEI). This processed text will be made available in various forms to our audience: HTML, RTF, ePub, PDF. Finally, 5) as desired, we could add more editorial apparatuses amounting to “full critical editions.”

James’s steps first put terror into my soul, as his system of arrows and numbers and parenthetical phrases distracted me from the essential reasonableness of his proposal. It even struck me as practical evidence that simplicity was hardly possible for this project (as Andrew and I argued) or desirable (as Chris argued). However, trying gallantly to distinguish good reasons from any egotistical desire that we work on revising my six-step enumeration as our starting point, I’ll admit that this reasoning grew on me, especially as James makes it clear that our numbered “steps” (operations) won’t have a one-to-one ratio to the outputs provided to our users (e.g., we recognize that some operations will not in themselves produce new outputs but are on the road to another output down the road).

Yet it still seemed to me and Chris that the “messy PDF” stage doesn’t offer our audience anything that the original scan of the text doesn’t offer. However, the “messy PDF” is an unavoidable stage, whether or not it is made available to users, so it is a bit of a moot point in terms of workflow. Further decisions about these matters were to be tabled, however, until the production of prototype texts (to be discussed in Part V).

To group or not to group?

In the November 14th initial email to the board, I had relayed Chris’s suggestion to have three overlapping personnel groups: an editorial board, a technical board, and textual editors. Matt cautiously and wisely responded,

I’m not a fan of breaking the group up into different groups. I’m more of a “post what needs to be done” and assign responsibilities. I think all discussions should happen in an open, across-the-board manner. Otherwise, it might appear that cliques are forming within the group and not all people are given adequate voice. That might change if there is a grant application and a hierarchy needs to be established, but I think it is good collaborative practice that all stakeholders know what is going on in the building process at all times.

Alex Christie agreed:

The diversity of outlook and expertise among this group is perhaps our greatest asset. We have technical-minded folk, people who can secure institutional support, folks who want to carefully edit and prepare scholarly texts, etc. If each of these groups can bring their best stuff to the group, to the inclusion of complementary resources, we can build something truly standout.

A consensus began to emerge in favor of respecting how different people are interested by different aspects of the process and have different visions for the range, number, and genre of texts to include. Given James’s and Stephen’s sensitivity to the practical use of time for those of us under various institutional pressures, divvying up work should allow various people to work out their interests: Claire’s interest in copyright, Chris’s and mine in digital textual editing, Andrew’s in web app development, Stephen’s in finding institutional resources, James’s in finding first editions, et cetera. We advocate for what we individually believe in, and we can guarantee its (eventual) manifestation if we put our money (read: time) where our mouths are.

James expressed no problem with our selection of texts being “opportunistic in the early stages and hence idiosyncratic.” As Matt wrote,

I would vote NOT to limit people on what they can do. If someone wants to scan the wasteland, God bless them. We all have individual passions. Let’s harness that. Why say no when we have a group of passionate, experienced yeses?

In this Joycean affirmation, Chris’s reiterated suggestion to limit the type of texts we include (e.g., no book-length works, no poetry, perhaps no fiction, nothing whose copyright status is unknown to us) was emphatically vetoed. But it may satisfy Chris, as our resident Spirit of Discipline and Forethought, that really the only proposals of his that have been completely vetoed were 1) mandatory TEI and 2) the need for an unambiguous doctrine controlling which genres to include or exclude from the anthology.

Do we risk a tower of Babel? For certain. But our biggest enemy is losing interest.

Hosting Facsimiles (an ultimatum)

And I felt were were going to lose our momentum. With the clock ticking without our competing workflows being reconciled or even clearly being compared explicitly, I pounded out a short, blunt proposal: Scan a “first-edition out of copyright” of your choice. Use the online form Andrew already constructed to submit it in PDF along with seven points of metadata: title; author name (family and given); work (original publication context, e.g., name of little magazine); publisher; city; year; pages (within original context); name of scanner/uploader. I hoped that this stark proposal would shake some issues out (I called it “giving something very hard and wall-like for people to bump up against”) by testing if we had consensus on one particular point: whether or not we need fresh scans of first editions. No question was being answered definitively as we all made strong arguments for varying courses of action. While this kind of complexity and ambiguity is good for scholarship, we weren’t arriving at a clear course of action.

Chris, game as ever, responded with an emphatic defense of using “existing publicly available scans of what (I think) I can justify bibliographically as good sources.” Doing so would not automatically preclude the option to make fresh scans of any particular text that we could easily scan ourselves or that does not have a reliable PDF version already available. (He also provided an amazing proof-of-concept “sample anthology” in his email, which I will discuss in the next blog post.) I responded to the group by saying that I agreed that this was the best option, as it saves time and reflects the uneven levels of digital availability of various texts. Matt and Claire also agreed to this practice, suggesting that we clearly attribute our source texts (by name and by linking). However, I was careful to specify what I considered to be the necessary corollaries of such a decision:

I think new scans are necessary IF a) we ever envision them becoming a part of the user experience OR IF b) we want to simplify & bulletproof our bibliographic arguments as to copyright compliance.

a) With the former, the scans we use for transcription purposes don’t have to be our own IF we are committing to some level of text markup at even the earliest stage of the project (we’ve been specifying Markdown). That’s great with me, but I hadn’t heard many people on this thread agreeing with Markdown. Because it is our transformation of that image by labor into our own, new version, it doesn’t matter who owned the image or PDF we initially used (assuming copyright is OK otherwise). How I’m seeing it: if we don’t make new scans, we are going to have to proceed directly to some form of markup rather than be able to roll out an early version of the site with newly scanned facsimile images of first editions in PDF (or TIFF) [….]

b) With the latter, I think it’s about whether we are OK with managing a case-by-case approval of specific currently available versions (someone else’s PDF, someone else’s OCR, someone else’s HTML version). We could have a flow chart that explains what forms of sources will be approved by the board, or we could ask for the board (or a smaller version of it) simply to approve or reject a particular source for a particular text. (“I used X source [here’s the link] and I trust it for Y reasons but I checked its accuracy by looking at Q first edition from my library/HathiTrust/my private collection.”)

While this response strikes me now as being overly priggish (n.b.: it’s quite awkward to represent as neutrally as possible one’s own neuroses & blindnesses), I can still try to recover my rationale: If we allow a flexible, multiple methods of sourcing the texts we’ll use for our site, I still do not believe that our end product for our users should be multiple and flexible to the same degree, as I think that’s too confusing and complicated. Imagine what our website will look like if some of the texts have facsimile images, others have links to facsimile images, and others have no access to facsimile images at all. Imagine the complexity of instructors trying to cobble together a custom coursepack out of differing sets of available file types. I’m not even sure we could pull that off technically.

Claire cautioned, however,

I’m worried in the framework Shawna suggests that we’re forgetting that Matt (and Stephen?) are advocating for page images as an output that people can use! I wonder if there is a way to have both - if we did markdown, pandoc, pdf for everything, and then where we had good, shareable page images, we could, well, share them. Or at least provide external links to them (if we’re using archive.org, etc?). This presents the problem that we don’t have good, high-res digitized versions of everything, which means more digitization for us if we want each text to have both options.

James’s response was a preference “to have original images available.” Chris, however, agreed with me that

linking back to a page image wherever possible (which will be possible in, I imagine, most cases) is vital and consistent with our desire for openness and transparency; I think model that Shawna describes (markdown -> output formats; not slicing/dicing page images, be they existing or newly digitized) is the way to go. But we need to be ahem on the same page here.

Andrew brought up the question that was bothering me—about the technical barriers to using facsimile images:

Which brings me to the point about hosting images: this is way harder than you all think. I mean, sure if you want a bunch of PDFs in a directory on a website, fine. But to do anything usable with them is a TON of work, as Chris indicated. PDF is a nightmare of a file format and doing anything meaningful with it (pulling certain pages out and recompiling them / displaying) is frankly difficult. Also, if they’re just page images, they’re unsearchable without some kind of plain text transcription. So, one way or another, we have to have these essays in plain text transcriptions (whether OCR or human-generated). To put it plainly, we (and by “we” I think I mean Chris and I) would have to do a lot more work to get the same end product. Which is, frankly, stupid.

I summarized,

My reasoning is that if we commit to hosting images ourselves, 1) that’s adding more hosting space by a non-trivial amount, 2) we’d have to make sure we have ownership over the images, which would slow down the process considerably, 3) also bringing up the specter of inconsistency, as in, we have some texts with image, PDF, and HTML, others without the image, etc, and then it could get confusing for our users (how would they know what formats to expect?), 5) as well as complicate our website design unnecessarily (Andrew?) and 5) also bring up the specter of page numbers for images if people expect to put them in the coursepack (Andrew and Chris have pretty much found a way to get page images through the coursepack maker process, but that doesn’t involve images but rather text manipulated through pandoc). However, the project needs to have images stored internally somewhere in case our source website goes down.

At this point, as Chris and Andrew agreed with me, and Claire agreed if we scrupulously linked back to source images, it seemed that we had reached a consensus: no publicly hosted facsimile images. James added the proviso that institutions through which we host our anthology “may very well wish to also host the page images/sources when it’s convenient or useful to do so.”

Observations on project management

Ironically (though it is probably opposite of irony, but I find myself at a loss because the word “literally” means nothing anymore), just as we acknowledged the strength in diversity, in allowing people to flex their various muscles as they wish, this particular consensus regarding content led away from methodological consensus. Discussions of the prototypes—the prototypes!—were sinking into a bit of a diminduendo. I felt nostalgic for the comparative simplicity & briskness of the November 10-14 exchanges among the four of us (Claire, Chris, Andrew, and me), who kept half-seriously & wholly-dolefully whispering to each other, were I the benevolent dictator….

It strikes me that project management is quite the libidinal economy. Trying to guide our discussions, I tried to balance the need to get on the same page (and confirm that said same page was flipped to) with the need to keep up the excitement. For example, when it seemed all a Forsterian muddle, Chris and I couldn’t, for the life of us, make people want to vote on particular topics. Voting, being too democratic yet also too autocratic, is so not modernist. Marinetti didn’t put his Manifesto to a committee, and Lewis just walked out when the Omega Arts group annoyed him for the last time. But if the desire for a crowd-sourced, rapidly prototyped anthology is anarchic, must the methods be anarchic as well?

Was it the holidays (the strange way in which we converged on debates during American Thanksgiving)? Was it the mere passage of time? What is reenergizing us is the addition of new people into the network—not necessarily adding more to the board, but on the periphery, feeding us their excitement and ideas. Jim Benstead is connecting us with SNOMS and DHnetS, brilliantly finding a great use for the training materials and tutorials over which he has assumed leadership. Stephen’s reiteration of Jeff Drouin’s interest (over at MJP) was echoed by Cathryn Setz’s interest (over at BAMS). We were astonished by the amount of positive reactions to the blurb Claire wrote and Cathryn Setz put up on BAMS. Though we expressed some initial reservations at sharing the spreadsheet with BAMS (wanting to protect Matt’s fantastic work making a powerful, detailed Google spreadsheet), Matt and I also emphasized that we have something to show people when they ask. (This impulse also led to me volunteering to write these blog posts as historical documentation as well as for raw material for future website copy).

BAMS’s excitement would not be denied. Claire wrote the next day, beleagured by well-wishers,

I’ve received at least a dozen emails since the blurb went out yesterday with offers to help with editing and some writing just to express their delight that we’re undertaking this! […] I’ve added editors to the spreadsheet when they’ve expressed interest in particular texts, but all I’ve asked them to do for now is trying to locate good source texts to work from.

Chris concurred,

If folks want to recommend/suggest a text, we can just add it to the list; if anyone wants to take responsibility for a text, that’s fine too (better even!)—–we just need to figure out what that entails: we’ll be able to point to documentation… once we decide on how this workflow works and what we’re doing.

Though we did not note it at the time, asking for public suggestions to texts will mitigate some weaknesses in our current spreadsheet of target texts (which Chris has called “an uneven, and at times outright weird list”). But it also into a fresh round of technical debates about the workflow. But do not fear, the arc of history bends towards justice etc, as Part V will also show.

Back to Index