Authorship Attribution Study of the Majestic Documents

⇐ Предыдущая 100 101 102 103 104 105106107 108 109 Следующая ⇒

2.1. Source of the Majestic Documents for Testing

The Majestic documents tested were obtained online via www….com

2.2. Selection of the Majestic Documents for Testing

For authorship attribution testing to be undertaken, the document under question must have been attributed to some author. As such, only those documents among the Majestic documents that specifically bear the name of a signatory author were considered for testing.

Any document that appeared important for validating the extraterrestrial hypothesis (ETH) as an explanation to UFOs was included in the testing. For example, a document that mentioned the retrieval or transport of wreckage from Roswell or some other event famous for its connection to the UFO question.

2.3. Overview of the Linguistic Testing Methods Used in the Study

The material in this section draws heavily upon the peer-reviewed article by Dr. Chaski. Dr. Chaski explains that, when it comes to document attribution in the legal world, methods for determining authorship “must work in conjunction with the standard investigative and forensic techniques which are currently available.” Determining authorship of a typewritten document, whether originally or subsequently put into electronic form, can be approached three ways: “... biometric analysis of the computer user; qualitative analysis of ‘idiosyncrasies’ in the language in questioned and known documents; and quantitative, computational stylometric analysis of the language in questioned and known documents.”

With respect to the Majestic documents, the first method is not possible—there is no way to analyze actual keystroke pattern dynamics. This method is technically non-linguistic. The second method “assesses errors and “idiosyncrasies” based on the examiner’s experience.” This method also has the disadvantage of requiring the pre-existence of a stylistic database against which to measure presumed idiosyncrasies.

The third approach, stylometry, “is quantitative and computational, focusing on readily computable and countable language features, e.g. word length, phrase length, sentence length, vocabulary frequency, distribution of words of different lengths.” Stylometric analysis also may include analysis of function word frequency and punctuation.

As one of the leaders in the field of the development of authorship attribution techniques that meet legal standards for evidence, Dr. Chaski has developed “a computational, stylometric method which has obtained 95% accuracy and has been successfully used in investigating and adjudicating several crimes involving digital evidence.”

One final word on the testing enterprise is necessary. It is acknowledged that many of the Majestic documents were not handwritten or even typed by the author to whom they are attributed. The typical practice, especially for presidents, would be to verbally dictate the content of correspondence to a secretary who would type and reproduce the content. This reality is not at odds with Dr. Chaski’s testing methods since memoranda and correspondence are not be produced by distinct psycho-linguistic processes. In other words, there is no significant linguistic difference between dictating a letter as one would desire it be written and the mental connection to the act of typing those thoughts oneself.

2.4. Explanation of the Test Results

In testing the Majestic documents, the first step involved taking the KNOWN documents undisputedly authored by the person whose authorship is attributed to them, and combining them together to get a “stylistic pool” of data for each author.

The second step was to run computational stylistic comparisons between each UNVERIFIED document to its corresponding set of KNOWN.

The third step was to compare each KNOWN document pool to all the other KNOWN document pools for similarity scores. The purpose of this step was to detect how similar or dissimilar one KNOWN document pool was to another KNOWN document pool.

The fourth step was to rank all of the resulting similarity scores. The similarity score of the UNVERIFIED document to its corresponding KNOWN document pool was ranked alongside the similarity scores of the KNOWN document pools compared to each other. That would be a “match” with respect to linguistic authorship validation.

2.5. Results

The results are illustrated below in the next several pages. …

⇐ Предыдущая 100 101 102 103 104 105106107 108 109 Следующая ⇒

Дата добавления: 2015-08-27; просмотров: 476. Нарушение авторских прав; Мы поможем в написании вашей работы!

Практические расчеты на срез и смятие При изучении темы обратите внимание на основные расчетные предпосылки и условности расчета...

Функция спроса населения на данный товар Функция спроса населения на данный товар: Qd=7-Р. Функция предложения: Qs= -5+2Р,где...

Аальтернативная стоимость. Кривая производственных возможностей В экономике Буридании есть 100 ед. труда с производительностью 4 м ткани или 2 кг мяса...

Вычисление основной дактилоскопической формулы Вычислением основной дактоформулы обычно занимается следователь. Для этого все десять пальцев разбиваются на пять пар...

Дизартрии у детей Выделение клинических форм дизартрии у детей является в большой степени условным, так как у них крайне редко бывают локальные поражения мозга, с которыми связаны четко определенные синдромы двигательных нарушений...

Педагогическая структура процесса социализации Характеризуя социализацию как педагогический процессе, следует рассмотреть ее основные компоненты: цель, содержание, средства, функции субъекта и объекта...

Типовые ситуационные задачи. Задача 1. Больной К., 38 лет, шахтер по профессии, во время планового медицинского осмотра предъявил жалобы на появление одышки при значительной физической Задача 1. Больной К., 38 лет, шахтер по профессии, во время планового медицинского осмотра предъявил жалобы на появление одышки при значительной физической нагрузке. Из медицинской книжки установлено, что он страдает врожденным пороком сердца....

ФАКТОРЫ, ВЛИЯЮЩИЕ НА ИЗНОС ДЕТАЛЕЙ, И МЕТОДЫ СНИЖЕНИИ СКОРОСТИ ИЗНАШИВАНИЯ Кроме названных причин разрушений и износов, знание которых можно использовать в системе технического обслуживания и ремонта машин для повышения их долговечности, немаловажное значение имеют знания о причинах разрушения деталей в результате старения...

Различие эмпиризма и рационализма Родоначальником эмпиризма стал английский философ Ф. Бэкон. Основной тезис эмпиризма гласит: в разуме нет ничего такого...

Индекс гингивита (PMA) (Schour, Massler, 1948) Для оценки тяжести гингивита (а в последующем и регистрации динамики процесса) используют папиллярно-маргинально-альвеолярный индекс (РМА)...

Studopedia.info - Студопедия - 2014-2025 год . (0.01 сек.) русская версия | украинская версия