1. A Selection of Tools
Import, Export and Manipulation
Import, Export and Manipulation
Import, Export and Manipulation
2. Gephi, Publication and Sharing
2.1 Why Gephi?
2.2 Using Sigma.js for Interactive Visualization and its Challenges
2.3 Publication, Sharing and Sustainability
My dissertation aims to investigate the benefits of network visualization and analysis for art history research. In particular, I am interested in relationships and connections between artists, which create a network of influence and creative inspiration. For this purpose, I already compiled a preliminary dataset of 129 nodes and 253 edges of 19th century artists, groups, institutions and movements. In order to assess the usefulness of such a visualization and use it to answer research questions, I created prototypes with various digital tools and evaluated how suitable the individual tool is for my purposes. The tools and methods I use need to be suitable for the data set I created, and it is important to keep in mind that the visualization of art history, as opposed to most data visualization, is:
Not about producing illustrations or maps to communicate things that you have discovered by other means. It is a means of doing research; it generates questions that might otherwise go unasked, it reveals […] relations that might otherwise go unnoticed, and it undermines, or substantiates, stories upon which we build our own versions of the past. (White, 2010, para.36)
Important factors considered in the following are: the nature of the tool (open access or proprietary), import and export of the data, manipulation of the visualization, and sharing of the finished visualization. After the evaluation of the tools, I will explain why I chose Gephi as a suitable tool and how it will be used. Finally, I will comment on publication and sharing of the project.
The first choice for the visualization of this project was Onodo, due to its very user-friendly and simple makeup and open science agenda. Onodo is a browser-run network visualization program developed by a team of journalists of the Spanish civil societal organization Civio, which enables users without technical knowledge to visualize complex networks (Using Onodo to Visualise Company Ownership Networks, 2018). It was initially developed to build influence maps to promote transparency and free access to public data in Spain after corruption scandals. Civio’s founder David Cabo (In: Onodo, an Open Source Network Analysis Tool for Non-tech Users, 2016) describes Onodo as an “open source tool that aims to facilitate network mapping and effective storytelling around relational information.” The project has been partly founded by CHEST, a European Union Research Project. Onodo is a perfect tool for people with no prior experience in network visualization and analysis, due to its user-friendliness and easy import and export of data as well as easy sharing, which is in line with the spirit of the Digital Humanities.
File upload is simple with Onodo and there is a template available, however, a downside is that it accepts xlsx files only. Upon upload, the program performs a basic scan and reports errors. After uploading the dataset, the visualization can be edited in a live editor by editing the dataset, however, there is no option to download the edited dataset. Furthermore, the shared and embedded visualization does not allow the user to download the dataset. The editor gives basic editing and analysis options with limited possibilities to change color and size of the nodes according to metrics.
When finished, the visualization can be embedded into a website via HTML or shared via permalink. Once embedded in the website, the visualization is interactive and can be shared by the page visitor via social media, html or permalink. It includes a handy search function, a collapsible legend, and an information panel with the attributes for a selected node, as well as the option to show the visualization full-screen. As the visualization needs to be published before it can be shared, Onodo cannot be used for sensitive data or projects that can only be visible for a select audience.
The main downside of Onodo is its very simple nature, which makes it easy to use, but does not allow for a great deal of manipulation and analysis, like filters, design options, and different layouts. Additionally, the platform is not very wide spread and does not have an active online community that could be consulted for assistance in the case of issues and questions.
As mentioned above, the possibilities of editing and adjusting the visualization with Onodo are very limited, which led me to look for another, more complex tool. I found the online network visualization and analysis tool Polinode, which seemed well suited for this project, due to advanced editing, and better design and display options. It works in a cloud and allows the use of an API to programmatically create, retrieve, update and delete networks. The main downside of Polinode is that it is proprietary and requires a paid subscription in order to use it. A basic Polinode account allows the user to create 5 public networks and save one view, but there are no private networks available with this plan.
Polinode accepts Excel, GEXF and JSON files and scans the document for mistakes and irregularities, reporting the findings with a description of the error, and thus facilitating correction and improvement of the dataset. After importing the data, Polinode offers a variety of metrics and layout and design options to adjust the visualization, like degree, communities, identify influencers, and color, size, and filter by calculated metrics and attributes. Every action can be added as a layer to the view, which makes it easier to keep track of the manipulations performed.
With the basic plan one view can be saved, which creates a permalink that can then be shared. Anyone with the link can manipulate the visualization and download it as SVG or PNG, however, it is not possible to download the dataset. While working on the visualization, all data in Polinode is exportable to Excel and can be captured in PNG, SVG, Excel, GEXF or Graph ML. The tool allows to add other users to the network to collaborate. Furthermore, Polinode automatically creates network reports for the data, which can be exported as an xslx file. The biggest downside of Polinode, however, is that it is not possible to embed a visualization into a website. The only way to share the visualization while preserving its interactivity is to share a permalink to a saved view which can then be edited by the page visitor. Furthermore, due to its proprietary nature and focus on business customers, there is not much easily accessible support.
I wanted to use a tool that is open source, like Onodo, but has advanced editing and analysis options, like Polinode, which lead me to discover Gephi. It is an open-source desktop software that uses openGL and CPU. This means that while the ability to work with huge networks depends on the CPU of the machine used, no internet connection is necessary. The extensive online community, documentation and support makes it easy to learn Gephi, although it takes more time to learn than Onodo and Polinode. Furthermore, there is a great number of plugins and extensions available for advanced users.
The program allows a lot of live manipulations of the network, as well as manipulation of the data in the Data Laboratory tab. Nodes and edges can be added both directly in the Data Laboratory and the Overview. The Overview tab is used for statistics, layout, and filtering, and enables taking of screenshots at any time, which is very useful to keep track of the process. Among various layout types, Gephi offers multiple filters and statistics as well as a great variety of design choices regarding labels, colors and background. The Preview tab is used for fine-tuning and exporting as PDF, SVG and PNG. This allows manipulations in graphic software programs. Furthermore, the Data Laboratory reflects all changes made in the Overview tab and can be downloaded as csv at any time. Another great feature is the timeline, which can be enabled when an interval is specified for the nodes.
Gephi was made for visualization, rather than for analysis, but the variety of statistics options is more than sufficient for my purposes. Data import works with csv files while data export is possible in multiple formats, including GEFX, Gephi’s own format, and a sigma.js template with the SigmaExports plugin enabled.
A Javacript GEXF viewer makes it possible to embed an interactive visualization into a html page, the link to which can then be shared. The live visualization gives a detailed list of attributes for each selected node, including a live link to the specified data source as well as connected nodes. However, it does not offer a download of the dataset and I have not found a way yet to make the timeline accessible in this view.
For my particular project, I consider Gephi the most suitable tool due to the variety of analysis and visualization options and the possibility to share the visualization as a webpage with the help of the sigma.js library without the need to know a lot of code. Additionally, I find it comfortable to use a desktop software as it allows me to work on the visualization without being dependent on an internet connection. As my visualization will consist of no more than 300 nodes and a maximum of 500 edges, the CPU of a normal laptop is more than enough to support smooth running of the program and the browser visualization does not take up considerable space on my server on reclaimhosting.
I like the interface of Gephi as it is logically structured and easy to establish an overview. The wide range of design options allows me to adjust the visualization to make it well legible and logically structured. Additionally, quite some work has already been done in the Digital Humanities and Digital Art History with Gephi. This can be of great help for my methodology as it means there are examples to draw on, like Matthew Lincoln’s (2013) visualization of the Getty Union List of Artist Names and its analysis.
Nevertheless, there are a few downsides of the online display through sigma.js: Firstly, I have not yet found a way to embed the visualization into my existing website instead of displaying it on a separate page. There is a Gephi plugin for WordPress, but, unfortunately, I was not able to find any documentation or help on how to use it and as it was last updated 3 years ago and has only 20+ active installations, I assume that it is not functioning anymore, which is a real pity. I tried to embed the page that shows the interactive visualization into my WordPress page with the help of an iFrame but have not been successful yet. It seems to me as if this could be a problem with WordPress.
Secondly, the timeline, which can be enabled in the gephi software, is not available for interactive display as there is no plugin for this yet. Finally, while there is a link to the data source for every node as I specified it as attribute, the browser view does not provide an option to download the dataset.
Once the visualization is finished, I will create a WordPress website to display and publish the project. The website will give information on the project and contain a version of the dissertation for download as pdf, a link to the interactive visualization, and an option for visitors to leave comments. Furthermore, I will publish the datasets of my project on Github and link it to the website in order to make the data accessible (data can be accessed at https://github.com/ckdigitalarts/network). WordPress is the website provider of my choice because it comes with reclaimhosting and I can easily create the webpage as a subdomain of my existing website for this course. This helps me keep an overview and represent a consistent online identity that is traceable to myself as a person.
Github is a suitable way to host my project data and its free open source nature is in line with the spirit of the Digital Humanities. Alternatives would be GitLab, an open source software with requires installation on a server, making it possible to use it on a custom domain, or Bitbucket, which works on the cloud or a local server. While both Gitlab and Bitbucket offer migration of data from Github, Github is still the most popular free source code hosting site, which means most visitors of my website who want to download the data will be familiar with it.
Being only one person and lacking in-depth data science and coding skills, I can only create a pilot visualization as an example for what is potentially possible and how network visualization of this kind could be used for art history research. This project is intended to be a finished project, however, by making the dataset freely available to others, I hope to enable interested researchers to continue the project. Finally, a further aspect of my dissertation is to examine the potential of this project as a Virtual Research Environment, which opens up possibilities of continuation and extension, as well as further research.
Anon 2016. Onodo, an Open Source Network Analysis Tool for Non-tech Users. [Blog] Influence Mapping. Available at: <https://www.influencemapping.org/blog/2016/04/30/onodo.html> [Accessed 7 Nov. 2018].
Anon 2018. Using Onodo to Visualise Company Ownership Networks. [online] Publish What You Pay. Available at: <http://www.publishwhatyoupay.org/pwyp-resources/using-onodo-visualise-company-ownership-networks-2/> [Accessed 3 Nov. 2018].
Lincoln, M.D., 2013. Looking through the ULAN with Gephi, I. Matthew Lincoln, PhD. Available at: <https://matthewlincoln.net/2013/06/20/looking-through-the-ulan-with-gephi.html> [Accessed 31 Mar. 2019].
White, R., 2010. What is Spatial History? [online] Spatial History Project. Available at: <http://stanford.edu/group/spatialhistory/cgi-bin/site/pub.php?id=29> [Accessed 3 Feb. 2019].