1) Extraversion (x) (sociable vs shy) 2) Neuroticism (n) (neurotic vs calm) 3) Agreeableness (a) (friendly vs uncooperative) 4) Conscientiousness (c) (organized vs careless) 5) Openness (o) (insightful vs unimaginative). --> e.g neuto seems correlate to conscinouness (shy = conscious?)
using neo pi-r, a tool to measure t In psychology, the Big Five personality traits are five broad domains or dimensions of personality that are used to describe human personality. The theory based on the Big Five factors is called the five-factor model (FFM). The five factors are openness, conscientiousness, extraversion, agreeableness, and neuroticism. Acronyms commonly used to refer to the five traits collectively are OCEAN, NEOAC, or CANOE. http://en.wikipedia.org/wiki/Big_Five_personality_traits
our dataset : PAN-AP-13 corpus - Author Profiling Shared Task
Trying to understand where/what/how of this dataset
Larger context: "uncovering plagiarism, authorship, and social software misuse"
"Authorship analysis deals with the classification of texts into classes based on the stylistic choices of their authors. Beyond the author identification and author verification tasks where the style of individual authors is examined, author profiling distinguishes between classes of authors studying their sociolect aspect, that is, how language is shared by people. This helps in identifying profiling aspects such as gender, age, native language, or personality type. Author profiling is a problem of growing importance in applications in forensics, security, and marketing. E.g., from a forensic linguistics perspective one would like being able to know the linguistic profile of the author of a harassing text message (language used by a certain type of people) and identify certain characteristics (language as evidence). Similarly, from a marketing viewpoint, companies may be interested in knowing, on the basis of the analysis of blogs and online product reviews, the demographics of people that like or dislike their products. The focus is on author profiling in social media since we are mainly interested in everyday language and how it reflects basic social and personality processes"
From the Readme:
"Moreover, documents from authors who pretend to be minors have been included (e.g., documents composed of chat lines of sexual predators). For any doubt or problem, please get in touch with us."
"Social media" = Chat messages? Different conversations. It seems mixed ... Gijs finds Spam, other types of messages.