Introducing the Bibliography on Stylometry – The Dragonfly's Gaze

Stylometry for authenticating online test takers. 1. Introduction .. description of the file, including the approximate date the samples were. 0. CrossRef citations to date. Altmetric. Listen. Articles. Stylometry and Authorship Attribution: Introduction to the Special Issue Published online: 22 May Stylometry is the application of the study of linguistic style, usually to written language, but it has .. The diffusion of Internet has shifted the authorship attribution attention towards online texts (web pages, blogs, etc.) electronic messages.

In practice, a large part of the entries are focused on stylometry understood as the theory and practice of authorship attribution with so-called non-traditional, quantitative methods.

Additionally, the bibliography also contains some forays into aspects not limited to literary texts, such as forensic linguistics or cognitive stylistics. Publications on statistics, machine learning, natural language processing or mainstream stylistics, literary theory and history are included only if they have a direct connection with an issue in stylometry as defined above. For a bibliography of wider scope but weaker coverage, see the more general Doing Digital Humanities bibliography.

The bibliography is open in scope with regard to publication formats, publication date, and language of publication. That is, besides monographs, journal articles and book chapters, it is open for published conference papers, technical reports if publicly available or blog posts, for instance.


Likewise, there are currently 38 entries with a publication date between and and if we can identify more, they will be very welcome. Where and in which formats is the bibliography available? The bibliography is freely available online, in a variety of forms and formats: First of all, the bibliography can be browsed online here: The bibliography currently contains around items.

This means it is by no means an exhaustive bibliography of all publications on the subject, but it does cover more ground than any other subject-specific bibliography available either in print or online. Also, the bibliography is meant to grow continually in the future and improve coverage of the field.

As of Aprilthe bibliography contains journal articles, conference papers, monographs, book chapters, B.

There are even two patents and one radio broadcast. A fun fact is that if you print this bibliography out on paper, it can easily run to almost pages.

Helander was first convicted of writing the letters and lost his position as bishop but later partially exonerated.

De-Anonymizing Authors of Electronic Texts: A Survey on Electronic Text Stylometry[v1] | Preprints

The letters were studied using a number of stylometric measures and also typewriter characteristics and the various court cases and further examinations, many contracted by Helander himself during the years up to his death in discussed stylometric methodology and its value as evidence in some detail. After his personal notes were made public on his 90th birthday ina study to determine which of those talks were written by him and which were written by various aides used stylostatistical methods.

This case was only resolved after a handwriting analysis confirmed the authorship. Instylometric methods were used to compare the Unabomber manifesto with letters written by one of the suspects, Theodor Kaczynski to his brother, which led to his apprehension and later conviction.

Ina group of linguists, computer scientists, and scholars analysed the authoship of Elena Ferrante.

Based on a corpus created at University of Padua containing novels written by 40 authors, they analyzed Ferrante's style based on seven of her novels. They were able to compare her writing style with 39 other novelists using, for example, stylo.

Domenico Starnone is the secret hand behind Elena Ferrante. Most methods are statistical in nature, such as cluster analysis and discriminant analysisare typically based on philological data and features, and are fruitful application domains for modern machine learning approaches. Whereas in the past, stylometry emphasized the rarest or most striking elements of a text, contemporary techniques can isolate identifying patterns even in common parts of speech.

Introducing the Bibliography on Stylometry

Most systems are based on lexical statistics, i. In this context, unlike in information retrievalthe observed occurrence patterns of the most common words are more interesting than the topical terms which are less frequent.

An example of a writer invariant is frequency of function words used by the writer.

Keep it secret, keep it safe! Preserving anonymity by subverting stylometry

In one such method, the text is analyzed to find the 50 most common words. The text is then broken into 5, word chunks and each of the chunks is analyzed to find the frequency of those 50 words in that chunk. This generates a unique number identifier for each chunk.

These numbers place each chunk of text into a point in a dimensional space. This dimensional space is flattened into a plane using principal components analysis PCA. This results in a display of points that correspond to an author's style.