Authorship Attribution Study of the Majestic Documents
2.1. Source of the Majestic Documents for Testing The Majestic documents tested were obtained online via www….com 2.2. Selection of the Majestic Documents for Testing For authorship attribution testing to be undertaken, the document under question must have been attributed to some author. As such, only those documents among the Majestic documents that specifically bear the name of a signatory author were considered for testing. Any document that appeared important for validating the extraterrestrial hypothesis (ETH) as an explanation to UFOs was included in the testing. For example, a document that mentioned the retrieval or transport of wreckage from Roswell or some other event famous for its connection to the UFO question. 2.3. Overview of the Linguistic Testing Methods Used in the Study The material in this section draws heavily upon the peer-reviewed article by Dr. Chaski. Dr. Chaski explains that, when it comes to document attribution in the legal world, methods for determining authorship “must work in conjunction with the standard investigative and forensic techniques which are currently available.” Determining authorship of a typewritten document, whether originally or subsequently put into electronic form, can be approached three ways: “... biometric analysis of the computer user; qualitative analysis of ‘idiosyncrasies’ in the language in questioned and known documents; and quantitative, computational stylometric analysis of the language in questioned and known documents.” With respect to the Majestic documents, the first method is not possible—there is no way to analyze actual keystroke pattern dynamics. This method is technically non-linguistic. The second method “assesses errors and “idiosyncrasies” based on the examiner’s experience.” This method also has the disadvantage of requiring the pre-existence of a stylistic database against which to measure presumed idiosyncrasies. The third approach, stylometry, “is quantitative and computational, focusing on readily computable and countable language features, e.g. word length, phrase length, sentence length, vocabulary frequency, distribution of words of different lengths.” Stylometric analysis also may include analysis of function word frequency and punctuation. As one of the leaders in the field of the development of authorship attribution techniques that meet legal standards for evidence, Dr. Chaski has developed “a computational, stylometric method which has obtained 95% accuracy and has been successfully used in investigating and adjudicating several crimes involving digital evidence.” One final word on the testing enterprise is necessary. It is acknowledged that many of the Majestic documents were not handwritten or even typed by the author to whom they are attributed. The typical practice, especially for presidents, would be to verbally dictate the content of correspondence to a secretary who would type and reproduce the content. This reality is not at odds with Dr. Chaski’s testing methods since memoranda and correspondence are not be produced by distinct psycho-linguistic processes. In other words, there is no significant linguistic difference between dictating a letter as one would desire it be written and the mental connection to the act of typing those thoughts oneself. 2.4. Explanation of the Test Results In testing the Majestic documents, the first step involved taking the KNOWN documents undisputedly authored by the person whose authorship is attributed to them, and combining them together to get a “stylistic pool” of data for each author. The second step was to run computational stylistic comparisons between each UNVERIFIED document to its corresponding set of KNOWN. The third step was to compare each KNOWN document pool to all the other KNOWN document pools for similarity scores. The purpose of this step was to detect how similar or dissimilar one KNOWN document pool was to another KNOWN document pool. The fourth step was to rank all of the resulting similarity scores. The similarity score of the UNVERIFIED document to its corresponding KNOWN document pool was ranked alongside the similarity scores of the KNOWN document pools compared to each other. That would be a “match” with respect to linguistic authorship validation. 2.5. Results The results are illustrated below in the next several pages. …
|