--- layout: post title: Visualizing textual variance/genetics with SankeyVariant categories: [software, digital humanities] tags: [software, digital humanities] published: True image: feature: post_images/2015-12-Sankey.png --- Over the holiday period I wanted to visualize the differences between two editions of a text that I had found to be very different (more on this in the new year). I couldn't find a ready-made solution, so I put together a small piece of software to achieve this: [SankeyVariant](https://github.com/MartinPaulEve/SankeyTextualVariant). The software is based on d3.js and the Sankey plugin and, with the modifications that I have introduced, it allows the production of diagrams like this: [![Sankey Diagram](/images/post_images/2015-12-Sankey.png)](/images/post_images/2015-12-Sankey.png) The idea here is that, in this case, chunks of identifiable text in one edition (on the left) are mapped onto chunks in the corresponding edition (on the right). In this instance, I have numbered paragraphs but there is no reason that it couldn't be done with actual textual correlation. All that is needed is a way to produce the relevant JSON for your use case. In my use case, the JSON looks like this: { "nodes":[ {"name":"PQ1 -> PR3"}, {"name":"PQ4 -> PR6"}, {"name":"PQ7 -> PR7"}, {"name":"PQ8 -> PR9"}, {"name":"PQ10 -> PR10"}, {"name":"PQ11 -> PR12"}, {"name":"PQ13 -> PR17"}, {"name":"PQ18 -> PR20"}, {"name":"PQ21 -> PR24"}, {"name":"PQ25 -> PR31"}, {"name":"PQ32 -> PR33"}, {"name":"PQ34 -> PR36"}, {"name":"PQ37 -> PR37"}, {"name":"PQ38 -> PR42"}, {"name":"PQ43 -> PR44"}, {"name":"PQ45 -> PR45"}, {"name":"PQ46 -> PR46"}, {"name":"PQ47 -> PR47"}, {"name":"PQ48 -> PR52"}, {"name":"PQ53 -> PR54"}, {"name":"PQ55"}, {"name":"PR55"}, {"name":"PQ56 -> PR56"}, {"name":"PQ57 -> PR57"}, {"name":"PQ58 -> PR58"}, {"name":"PQ59 -> PR59"}, {"name":"PQ60 -> PR60"}, {"name":"PQ61 -> PR62"}, {"name":"PQ63 -> PR73"}, {"name":"PQ74 -> PR74"}, {"name":"PQ75 -> PR90"}, {"name":"PQ91 -> PR120"}, {"name":"E-EDITION BREAK"}, {"name":"PQ121 -> PR127"}, {"name":"PQ128 -> PQ129"}, {"name":"PR129 -> PR131"}, {"name":"PQ132 -> PR158"}, {"name":"PQ159 -> PR159"}, {"name":"PQ160 -> PR171"}, {"name":"PQ172 -> PR173"}, {"name":"PQ174 -> PR188"}, {"name":"PQ189 -> PR190"}, {"name":"PQ191 -> PR191"}, {"name":"PQ192 -> PR192"}, {"name":"PQ193 -> PR193"}, {"name":"E-EDITION END"}, {"name":"PQ194 -> PR203"}, {"name":"PQ204 -> PR204"}, {"name":"PQ205 -> PR205"}, {"name":"PQ206 -> PR206"}, {"name":"PQ207 -> PR208"}, {"name":"PQ209 -> PR210"}, {"name":"EQ1 -> ER3"}, {"name":"EQ4 -> ER4"}, {"name":"EQ5 -> ER5"}, {"name":"EQ6 -> ER9"}, {"name":"EQ10 -> ER10"}, {"name":"EQ11 -> ER13"}, {"name":"EQ14 -> ER20"}, {"name":"EQ21 -> ER23"}, {"name":"EQ24 -> ER29"}, {"name":"EQ30 -> ER30"}, {"name":"EQ31 -> ER31"}, {"name":"EQ32 -> ER32"}, {"name":"EQ33 -> ER34"}, {"name":"EQ35 -> ER39"}, {"name":"EQ40 -> ER40"}, {"name":"EQ41 -> ER42"}, {"name":"EQ43"}, {"name":"ER43"}, {"name":"EQ44"}, {"name":"ER44"}, {"name":"EQ45 -> ER45"}, {"name":"EQ46 -> ER46"}, {"name":"EQ47 -> ER48"}, {"name":"EQ49 -> ER49"}, {"name":"EQ50 -> ER60"}, {"name":"EQ61 -> ER61"}, {"name":"EQ62 -> ER77"}, {"name":"EQ78 -> ER78"}, {"name":"EQ79 -> ER108"} , {"name":"EQ109 -> ER115"}, {"name":"EQ116 -> ER117"}, {"name":"EQ118"}, {"name":"ER118 -> ER120"}, {"name":"P-EDITION BREAK"}, {"name":"EQ121 -> ER147"}, {"name":"EQ148 -> ER159"}, {"name":"EQ161 -> ER175"}, {"name":"EQ176 -> ER176"}, {"name":"EQ177 -> ER177"}, {"name":"EQ178 -> ER178"}, {"name":"EQ179 -> ER179"}, {"name":"EQ180 -> ER189"}, {"name":"EQ190 -> ER190"}, {"name":"EQ191 -> ER191"}, {"name":"EQ192 -> ER192"}, {"name":"EQ193 -> ER194"}, {"name":"P-EDITION END"} ], "links":[ {"source":"PQ1 -> PR3","target":"EQ1 -> ER3","value":6}, {"source":"PQ7 -> PR7","target":"EQ4 -> ER4","value":2}, {"source":"PQ10 -> PR10","target":"EQ5 -> ER5","value":2}, {"source":"PQ13 -> PR17","target":"EQ6 -> ER9","value":10}, {"source":"PQ18 -> PR20","target":"EQ11 -> ER13","value":6}, {"source":"PQ25 -> PR31","target":"EQ14 -> ER20","value":14}, {"source":"PQ34 -> PR36","target":"EQ21 -> ER23","value":6}, {"source":"PQ38 -> PR42","target":"EQ24 -> ER29","value":12}, {"source":"PQ43 -> PR44","target":"EQ25 -> ER29","value":4}, {"source":"PQ45 -> PR45","target":"EQ30 -> ER30","value":2}, {"source":"PQ46 -> PR46","target":"EQ32 -> ER32","value":2}, {"source":"PQ47 -> PR47","target":"EQ31 -> ER31","value":2}, {"source":"PQ48 -> PR52","target":"EQ35 -> ER39","value":10}, {"source":"PQ53 -> PR54","target":"EQ41 -> ER42","value":4}, {"source":"PQ55","target":"EQ43","value":1}, {"source":"PQ55","target":"EQ44","value":1}, {"source":"PR55","target":"ER43","value":1}, {"source":"PR55","target":"ER44","value":1}, {"source":"PQ57 -> PR57","target":"EQ45 -> ER45","value":2}, {"source":"PQ58 -> PR58","target":"EQ40 -> ER40","value":2}, {"source":"PQ59 -> PR59","target":"EQ46 -> ER46","value":2}, {"source":"PQ61 -> PR62","target":"EQ47 -> ER48","value":4}, {"source":"PQ63 -> PR73","target":"EQ50 -> ER60","value":20}, {"source":"PQ75 -> PR90","target":"EQ62 -> ER77","value":32}, {"source":"PQ91 -> PR120","target":"EQ79 -> ER108","value":60}, {"source":"PQ121 -> PR127","target":"EQ109 -> ER115","value":14}, {"source":"PR129 -> PR131","target":"ER118 -> ER120","value":5}, {"source":"PQ132 -> PR158","target":"EQ121 -> ER147","value":54}, {"source":"PQ160 -> PR171","target":"EQ148 -> ER159","value":24}, {"source":"PQ174 -> PR188","target":"EQ161 -> ER175","value":30}, {"source":"PQ191 -> PR191","target":"EQ178 -> ER178","value":2}, {"source":"PQ192 -> PR192","target":"EQ177 -> ER177","value":2}, {"source":"PQ193 -> PR193","target":"EQ179 -> ER179","value":2}, {"source":"PQ194 -> PR203","target":"EQ180 -> ER189","value":20}, {"source":"PQ204 -> PR204","target":"EQ190 -> ER190","value":2}, {"source":"PQ204 -> PR204","target":"EQ191 -> ER191","value":2}, {"source":"PQ205 -> PR205","target":"EQ191 -> ER191","value":2}, {"source":"PQ206 -> PR206","target":"EQ192 -> ER192","value":2}, {"source":"PQ209 -> PR210","target":"EQ193 -> ER194","value":2} ], "nolink":[ {"source":"PQ4 -> PR6","location":0,"value":6}, {"source":"EQ10 -> ER10","location":1,"value":2}, {"source":"PQ8 -> PR9","location":0,"value":2}, {"source":"PQ11 -> PR12","location":0,"value":4}, {"source":"PQ21 -> PR24","location":0,"value":8}, {"source":"PQ32 -> PR33","location":0,"value":4}, {"source":"PQ37 -> PR37","location":0,"value":2}, {"source":"PQ43 -> PR44","location":0,"value":4}, {"source":"EQ33 -> ER34","location":1,"value":2}, {"source":"PQ56 -> PR56","location":0,"value":2}, {"source":"PQ60 -> PR60","location":0,"value":2}, {"source":"EQ49 -> ER49","location":1,"value":2}, {"source":"EQ61 -> ER61","location":1,"value":2}, {"source":"PQ74 -> PR74","location":0,"value":2}, {"source":"EQ78 -> ER78","location":1,"value":2}, {"source":"E-EDITION BREAK","location":0,"value":10}, {"source":"PQ128 -> PQ129","location":0,"value":3}, {"source":"EQ116 -> ER117","location":1,"value":4}, {"source":"EQ118","location":1,"value":1}, {"source":"P-EDITION BREAK","location":1,"value":10}, {"source":"PQ159 -> PR159","location":0,"value":2}, {"source":"PQ172 -> PR173","location":0,"value":4}, {"source":"PQ189 -> PR190","location":0,"value":4}, {"source":"E-EDITION END","location":0,"value":10}, {"source":"PQ207 -> PR208","location":0,"value":4}, {"source":"P-EDITION END","location":1,"value":10} ]} In this dataset, the nodes dictionary simply gives names to each single node in the text. Nodes should be listed by column in the order you want them to appear. The links dictionary specifies how links should be drawn between entities (you can also create multiple columns). The final nolink dictionary gives a way to specify the weight and position (by zero-indexed array of column) for unlinked nodes. The main modifications that I made to the standard d3.js sankey diagram implementation were: * to allow the insertion of weighted unlinked entries * to undo the reordering of nodes (by default sankey in d3.js tries to minimise crossovers, whereas I wanted chronology preserved) I can think of many exciting uses for this, and I know that some others are also interested in working this up into a larger project. For now, though, I'll just say that I hope my first paper that uses this output will be coming out next year. Merry Christmas all!