\
  The most prestigious law school admissions discussion board in the world.
BackRefresh Options Favorite

big tech knows literally everything about you

("Networks of Control" by Wolfie Christl, Sarah Sp...
up-to-no-good ebony juggernaut
  03/21/18
The five-factor model of personality, also known as the Big ...
up-to-no-good ebony juggernaut
  03/21/18
So they know I'm an autistic Russian spy named Borris with a...
multi-colored sexy temple
  03/21/18
Forecasting future movements based on phone data Based on...
up-to-no-good ebony juggernaut
  03/21/18
...
up-to-no-good ebony juggernaut
  03/21/18


Poast new message in this thread



Reply Favorite

Date: March 21st, 2018 1:33 PM
Author: up-to-no-good ebony juggernaut

("Networks of Control" by Wolfie Christl, Sarah Spikermann) http://crackedlabs.org/dl/Christl_Spiekermann_Networks_Of_Control.pdf

The following section will explore the possibilities of deriving sensitive information about people’s lives from digital records that on the surface do not seem to carry a lot of information and shed light on the information that can be inferred from transactional data such as purchases, calls, messages, likes and searches.

The selection of analysis methods summarized in the following chapters show that today’s digitally tracked data allows companies to predict many aspects of a person’s personality as well as sensitive personal attributes. Although these methods are based on statistical correlations and probabilities their outcomes and conclusions are considered good enough to automatically sort, rate and categorize people.

After a brief summary of the often cited predictive analysis conducted by the U.S. supermarket chain Target several academic studies on predictive analytics are reviewed. Some of these studies were partly conducted in collaboration with companies like Nokia, Microsoft, and Facebook. However, the majority of such analyses and their practical applications are realized by companies that don’t publish details about their practical application of predictive analytics.

One of the most cited examples about the prediction of sensitive information based on the analysis of everyday digital data is the case of the U.S. supermarket chain Target and its

attempt to identify pregnant customers baseon their shopping behavior. As Charles Duhigg reported in the New York Times 12 and in his book “The Power of Habit” (Duhigg 2012), Target assigns a unique code to all of its customers. All purchases and interactions are recorded – regardless of whether people are paying by credit card, using a coupon, filling out a survey, mailing in a refund, calling the customer help line, opening an email from them or visiting their website. Additionally, Target buys additional information on customers from data brokers.

Duhigg spoke extensively with a statistician from Target, whose marketing analytics department was tasked with analyzing the behavior of customers and finding ways to increase revenue. The statistician reported that one of the simpler tasks was to identify parents with children and send them catalogues with toys before Christmas. Another example he gave was the identification of customers who bought swimsuits in April and to send them coupons for sunscreen in July and weight-loss books in December. But the main challenge was to identify those major moments in consumers’ lives when their shopping behavior becomes “flexible” and the right advertisement or coupon would be effective in causing them to start shopping in new ways – for example college graduation, marriage, divorce or moving house. According to a researcher cited by Duhigg, specific advertisements sent exactly at the right time, could change a customer’s shopping behavior for years.

One of the most lucrative moments would be the birth of a child. The shopping habits of exhausted, new parents would be e more flexible than at any other point in their lives. According to Target’s statistician, they identified 25 products which were significant to create a so called “pregnancy prediction” score and could even estimate the birth date. It is important to understand that they didn’t simply look at purchases of baby clothes or buggies, which would be obvious. Instead, they analyzed statistical patterns about people purchasing certain quantities of specific lotions, soaps, hand sanitizers, cotton balls, washcloths or nutritional supplements at precise points in time.

When pregnant women were identified they received different kinds of personalized advertisements, coupons or other incentives at specific stages of their pregnancy. Duhigg also reported that a father reached out to Target and accused them of encouraging his daughter to get pregnant, because they sent coupons for baby clothes to her. To her father’s surprise it turned out that the girl was indeed pregnant and did not tell him about it.

Regardless of whether this anecdote is true, Duhigg’s research about Target became one of the most prominent examples of how today’s companies are collecting and analyzing personal data to influence their customer’s behavior on an individual level.

2.2.2 Predicting sensitive personal attributes from Facebook Likes

A study conducted at the University of Cambridge showed that it is possible to accurately predict ethnicity, religious and political views, relationship status, gender, sexual orientation as well as a person’s consumption of alcohol, cigarettes and drugs based on the analysis of Facebook Likes (see Kosinski et al 2013). The analysis was based on data of 58,466 users from the United States, who participated in surveys and voluntarily provided demographic information through a specific Facebook app called myPersonality 13 . This app also analyzed what they “liked” on Facebook, i.e. their positive associations with popular websites or other content in areas such as products, sports, musicians and books.

Researchers were able to automatically predict sensitive personal attributes quite accurately, solely based on an average of 170 Likes per Facebook user:

Predicted attribute | Prediction accuracy

Ethnicity – “Caucasian vs. African American” 95%

Gender 93%

Gay? 88%

Political views – “Democrat vs. Republican” 85%

Religious views – “Christianity vs. Islam” 82%

Lesbian? 75%

Smokes cigarettes? 73%

Drinks alcohol? 70%

Uses drugs 65%

Single or in a relationship? 67%

Were the parents still together at 21? 60%

This shows that, for example, 88% of participants who declared themselves as gay when providing their demographic data were correctly classified as gay by the analysis based on Facebook Likes only. Researchers used the statistical method of logistic regression 14 to predict these dichotomous variables (e.g. yes/no) above. In addition, they also used linear regression 15 to predict numeric variables like age, which was predicted correctly for 75% of participants. As the researchers explain, only a “few users were associated with Likes explicitly revealing their attributes”. For example, “less than 5% of users labeled as gay were connected with explicitly gay groups” such as “Being Gay”, “Gay Marriage” or “I love Being Gay”. Predictions rely on less obvious, but more popular Likes such as “Britney Spears” or “Desperate Housewives” – which proved to be weak indicators of being gay. It’s remarkable that even the question whether user’s parents have stayed together after this user was 21 years old was correctly predicted with an accuracy of 60%.

This study shows that sensible personal attributes, which are usually considered as rather private, can be automatically and accurately inferred from rather basic information about online behavior. According to Kosinski et al, Facebook Likes represent a very generic type of digital records about users, similar to web searches, browsing histories and credit card transactions. For example, Facebook Likes related to music and artists are very similar to data about songs listened to or artists searched for online. Yet, in comparison to web searches and purchases the Likes of Facebook users are publicly accessible by default.

(http://www.autoadmit.com/thread.php?thread_id=3924669&forum_id=2#35653051)



Reply Favorite

Date: March 21st, 2018 1:41 PM
Author: up-to-no-good ebony juggernaut

The five-factor model of personality, also known as the Big Five model, is one of the leading models of personality psychology. 16 It has been the subject of nearly 2,000 publications alone between 1999 and 2006. 17 Many studies have proven its producibility and consistency among different groups of age and culture. 18 The model is regularly used in the context of predicting user characteristics based on digital data.

According to the “Big Five” model, every person can be rated along five dimensions:

Personality Dimension | People who are rated as high in this dimension could be

Extraversion: Active, assertive, energetic, enthusiastic, outgoing, talkative

Agreeableness: Appreciative, forgiving, generous, kind, sympathetic, trusting

Conscientiousness: Efficient, organized, planful, reliable, responsible, thorough

Neuroticism: Anxious, self-pitying, tense, touchy, unstable, worrying

Openness: Artistic, curious, imaginative, insightful, original, wide interests

A Swiss study in collaboration with Nokia Research showed that these “Big Five” personality traits can be predicted based on smartphone metadata with an accuracy of up to 75,9% (see Chittaranjan et al 2011). At first 83 persons were asked to assess themselves using a questionnaire. Second, their communication behavior was tracked using special software installed on their phones for 8 months. For example, the following

data was recorded:

Category | Which data was recorded and analyzed?

App usage: Number of times the following apps were used: Office, Internet, Maps, Mail, Video/Audio/Music, YouTube, Calendar, Camera, Chat, SMS, Games

Call logs: Number of incoming/outgoing/missed calls, number of unique contacts called and unique contacts who called, average duration of incoming/outgoing calls, …

SMS logs: Number of received/sent text messages, number of recipients/senders, Ø word length,…

Bluetooth: Number of unique Bluetooth IDs, times most common Bluetooth ID is seen, …

Chittaranjan et al. recorded “data that provides information about other data”, also known

as metadata 20 – not the contents of the communication. 21 Applying multiple regression analysis 22 , the following significant statistical correlations between smartphone metadata

and personality traits were detected (instead of “neuroticism” the inverted variant “emotional stability” was used): (table 3)

For example, participants who received a higher number of calls, were more likely to be agreeable (r = 0.20) and emotionally stable (r = 0.15). In contrast, participants who used the Office app more, were less likely to be open for new experience (r=-0.26). Relationships with a correlation coefficient < 0.5 are weak but still exist.

Furthermore, a machine learning model was developed to automatically classify users based on their smartphone metadata:

Trait | Prediction accuracy

Emotional Stability: 71.5 %

Extraversion: 75.9 %

Openness for Experience: 69.3 %

Conscientiousness: 74.5 %

Agreeableness: 69.6 %

A newer study from 2015 suggests that computer-based personality judgments could be even more accurate than those made by humans (see Youyou et al 2015). Again, analysis was based on data obtained through the “myPersonality” Facebook app. And, again, the researchers Michal Kosinski and David Stillwell were involved. They compared the “accuracy of human and computer-based personality judgments” using the results of questionnaires from 17,622 participants and data about Facebook Likes from 86,220 participants. Their automated predictions on personality based on Facebook Likes (r = 0.56) were more accurate than those of people, who are the participant’s Facebook friends and filled out a questionnaire (r = 0.49). While the judgements of individuals considered as “spouse” (r = 0.58) were more exact than the computer models, the answers of participants considered as “family” (r = 0.50) were also less accurate than the predictions of the machines.

In addition to the “Big Five” personality traits, Montjoye et al further examined “13 life outcomes and traits previously shown to be related to personality” such as life satisfaction, impulsivity, depression, sensationalist interest, political orientation, substance use and physical health. As a result the “validity of the computer judgments” was again “higher than that of human judges in 12 of the 13 criteria”. They state that Facebook Likes “represent one of the most generic kinds of digital footprint” and that their results present “significant opportunities and challenges in the areas of psychological assessment, marketing, and privacy”.



(http://www.autoadmit.com/thread.php?thread_id=3924669&forum_id=2#35653116)



Reply Favorite

Date: March 21st, 2018 1:42 PM
Author: multi-colored sexy temple

So they know I'm an autistic Russian spy named Borris with a Kia Stinger? Ok. You know that too.

(http://www.autoadmit.com/thread.php?thread_id=3924669&forum_id=2#35653122)



Reply Favorite

Date: March 21st, 2018 1:45 PM
Author: up-to-no-good ebony juggernaut

Forecasting future movements based on phone data

Based on the analysis of smartphone data from 25 participants, researchers in the U.K. were able to predict what the participants’ probable geographic position would be 24 hours later. In their study from 2012, De Domenico et al were able to exploit the correlation between movement data and social interactions in order to improve the accuracy of forecasting of the future geographic position of a user.

Using data logs from 25 phones, including “GPS traces, telephone numbers, call and SMS history, Bluetooth and WLAN history”, the scientists forecasted the future GPS coordinates of the users based on their movement. This resulted in an average error of 1,000 meters.

When the prediction model was subsequently extended to include the mobility data from user’s friends, the average error of the prediction could be reduced to less than 20 meters. 27 The friendship relation between two users was, for example, derived based on one of them appearing in the address book of others.

The researchers outline that previous work has already shown that “human movement is predictable to a certain extent at different geographic scales” (De Domenico et al 2012, p. 1). In their study, they point to the fact that their “dataset contains a small number of users, so it is difficult to make claims about the general validity of this finding” (ibid., p. 4). However, the authors show that knowledge about a user’s social contacts can increase the accuracy of predictions about that user considerably. Forecasting movements of people based on digital records could be used in several fields from marketing to governments. For example, law enforcement authorities could keep a special eye on people whose movements don’t conform to the predicted ones.

De-anonymization and re-identification

In many fields from scientific research to digital communication technology data sets, which include information on individuals, are anonymized or pseudonymized to protect individuals.

Pseudonymization involves the replacement of names and other identifying attributes with pseudonyms, for example by combinations of letters and digits. The EU General Data Protection Regulation defines it as the “processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information”. 30 When additional information, for example how names relate to pseudonyms, is known, pseudonymity can be easily reverted. In contrast, the purpose of anonymization is to get rid of any information that would allow the re-identification of individuals. There are many challenging aspects and concepts around pseudonymity and anonymity (see Pfitzmann and Hansen 2010).

Besides the fact that different assessments about which attribus should be considered as “personally-identifiable”, many of today’s companies are using terms such as “anonymized” or “de-identified” in ambiguous or even wrong ways. 31 There are also fundamental problems concerning anonymization today, as for example Paul Ohm (2009) showed. Depending on the kind and quantity of anonymized or pseudonymized data records it may still be possible to identify a person. If, for example, a small data set doesn’t contain names, but instead initials and birthdates, it is often possible to identify a person by means of additional databases or publicly available information, for example because the combination of initials and birthdates is often unique. 32 A study from 1990 discovered that the combination of zip code, gender and birth date was unique for 216 of 248 million U.S. citizens (87%) and therefore makes identification possible. Consequently, data records with names removed but zip codes, gender and birth dates still includ cannot be seen as anonymized. Therefore, it is not sufficient to only remove obviously identifying information such as name, social insurance number or IP address to anonymize data records.

The more detailed a data record is, the more potential links to other sources. In addition, the better the technologies use are the easier it is to identify a person, even if data seems to be anonymized. Since more and more various data about individuals is stored, this issue became increasingly severe. When, for example, AOL published detailed “anonymous” log files about web searches of 675,000 users in 2006, some of them could be identified just based on their search history (see Ohm 2009).

In recent years, elaborate statistical methods for de-anonymization were developed. When Netflix published an “anonymized” data set containing movie ratings of 500,000 subscribers in 2006, a study showed that a subscriber could be easily identified, when a bit of background knowledge about this person was available. To achieve this, researchers compared and linked the “anonymized” movie ratings of the Netflix subscribers with publicly available reviews on the website imdb.com, where users often used their real names. On average between two and eight reviews from imdb.com were needed to identify persons in the Netflix dataset (see Narayanan and Shmatikov 2008).

A study from 2013 analyzed the mobility data of 1.5 million mobile phone users and proved that just four spatio-temporal data points were enough to uniquely identify 95% of the users. The combination of four times and locations where users made or received calls is highly unique amongst different people (see Montjoye et al 2013b). According to another study, a combination of just four apps installed on a users’ smartphone was sufficient to re-identify 95% of the users amongst a data set with lists of installed apps of 54,893 smartphone users (Achara et al 2015). It might be reasonably assumed that other types of similar data such as purchases, search terms, visited websites and Facebook Likes provide similar results.

Academic studies aside, such technologies are already used in practice to re-identify users. For example, online marketers and data brokers use browser fingerprints or device fingerprints to re-identify users based on the specific characteristics of their web browsers and devices (seeBujlow et al 2015). Also biometric data from iris, voice and face recognition as well as analyses of keystrokes and mouse dynamics (see Mudholkar 2012) can be used to re-identify people – akin to traditional fingerprints or DNA profiles.

(http://www.autoadmit.com/thread.php?thread_id=3924669&forum_id=2#35653144)



Reply Favorite

Date: March 21st, 2018 3:41 PM
Author: up-to-no-good ebony juggernaut



(http://www.autoadmit.com/thread.php?thread_id=3924669&forum_id=2#35654025)