c:itamc:2017
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
c:itamc:2017 [2017/11/13 14:03] – hkimscil | c:itamc:2017 [2022/06/13 08:35] (current) – [e.g. 1] hkimscil | ||
---|---|---|---|
Line 2: | Line 2: | ||
Introduction | Introduction | ||
Introduction to [[:social network analysis]] | Introduction to [[:social network analysis]] | ||
- | + | * [[amazon> | |
+ | * {{amazon> | ||
+ | |||
+ | data file: {{: | ||
+ | create Textmining directory in R working directory. | ||
+ | Unzip the zip file. | ||
+ | ====== e.g. 1 ====== | ||
+ | < | ||
+ | NeededPackages <- c(" | ||
+ | " | ||
+ | install.packages(NeededPackages, | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | docs <- Corpus(DirSource(" | ||
+ | docs | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | writeLines(as.character(docs[[30]])) | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | |||
+ | </ | ||
+ | |||
+ | < | ||
+ | #create the toSpace content transformer | ||
+ | toSpace <- content_transformer(function(x, | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | docs <- tm_map(docs, | ||
+ | docs <- tm_map(docs, | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | docs <- tm_map(docs, | ||
+ | |||
+ | docs <- tm_map(docs, | ||
+ | docs <- tm_map(docs, | ||
+ | docs <- tm_map(docs, | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | docs <- tm_map(docs, | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | docs <- tm_map(docs, | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | #remove stopwords using the standard list in tm | ||
+ | docs <- tm_map(docs, | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | #Strip whitespace (cosmetic? | ||
+ | docs <- tm_map(docs, | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | writeLines(as.character(docs[[30]])) | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | #load library | ||
+ | library(SnowballC) | ||
+ | |||
+ | #Stem document | ||
+ | docs <- tm_map(docs, | ||
+ | writeLines(as.character(docs[[30]])) | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | docs <- tm_map(docs, | ||
+ | docs <- tm_map(docs, | ||
+ | docs <- tm_map(docs, | ||
+ | docs <- tm_map(docs, | ||
+ | docs <- tm_map(docs, | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | dtm | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | inspect(dtm[1: | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | length(freq) | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | #create sort order (descending) | ||
+ | ord <- order(freq, | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | #inspect most frequently occurring terms | ||
+ | freq[head(ord)] | ||
+ | |||
+ | #inspect least frequently occurring terms | ||
+ | freq[tail(ord)] | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | dtmr < | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | freqr <- colSums(as.matrix(dtmr)) | ||
+ | #length should be total number of terms | ||
+ | length(freqr) | ||
+ | |||
+ | #create sort order (asc) | ||
+ | ordr <- order(freqr, | ||
+ | |||
+ | #inspect most frequently occurring terms | ||
+ | freqr[head(ordr)] | ||
+ | |||
+ | #inspect least frequently occurring terms | ||
+ | freqr[tail(ordr)] | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | findFreqTerms(dtmr, | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | findAssocs(dtmr, | ||
+ | findAssocs(dtmr, | ||
+ | findAssocs(dtmr, | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | wf=data.frame(term=names(freqr), | ||
+ | library(ggplot2) | ||
+ | p <- ggplot(subset(wf, | ||
+ | p <- p + geom_bar(stat=" | ||
+ | p <- p + theme(axis.text.x=element_text(angle=45, | ||
+ | p | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | library(wordcloud) | ||
+ | #setting the same seed each time ensures consistent look across clouds | ||
+ | set.seed(42) | ||
+ | #limit words by specifying min frequency | ||
+ | wordcloud(names(freqr), | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | #…add color | ||
+ | wordcloud(names(freqr), | ||
+ | |||
+ | </ | ||
+ | |||
+ | ====== e.g. 1 with output ====== | ||
+ | |||
+ | < | ||
+ | [1] " | ||
+ | > library(tm) | ||
+ | > #Create Corpus | ||
+ | > docs <- Corpus(DirSource(" | ||
+ | > docs | ||
+ | << | ||
+ | Metadata: | ||
+ | Content: | ||
+ | > #inspect a particular document | ||
+ | > writeLines(as.character(docs[[30]])) | ||
+ | Understanding " | ||
+ | |||
+ | Introduction | ||
+ | Flexibility is one of those buzzwords that keeps coming up in organizational communiques and discussions. People are continually asked to display flexibility, | ||
+ | When words are used in this way they become platitudes ?empty words that make a lot of noise. In this post, I analyse the platitude, flexibility, | ||
+ | Background ?a bit about organizational platitudes | ||
+ | One of the things that struck me when I moved from academia to industry is the difference in the way words or phrases are used in the two domains. In academics one has to carefully define the terms one uses (particularly if one is coining a new term) whereas in business it doesn' | ||
+ | A good example of a platitude is the word governance. One manager may see governance as being largely about oversight and control whereas another might interpret it as being about providing guidance. | ||
+ | Flexibility ?the conventional view | ||
+ | A good place to start our discussion of flexibility is with the dictionary. The online Oxford Dictionary defines at as: | ||
+ | Flexibility (noun): | ||
+ | the ability to be easily modified | ||
+ | | ||
+ | The term is widely used in both these senses in organizational settings. For example, people speak of flexible designs (i.e. designs that can be easily modified) or flexible people (referring to those who are willing to change or compromise). However, | ||
+ | Jobs are flexible in the sense that they are unstable and uncertain, few employees hold the same jobs for many years, the content of jobs can be changed almost overnight, and the boundaries between work and leisure are negotiable and chronically fuzzy. | ||
+ | Indeed, such " | ||
+ | Understanding flexibility | ||
+ | Consider the following definition of flexibility proposed by Gregory Bateson: | ||
+ | " | ||
+ | This deceptively simple statement is a good place to start understanding what flexibility really means for projects, organisations 꿢nd even software systems. | ||
+ | As Eriksen tells us, Bateson proposed this definition in the context of ecology. In particular, Bateson had in mind the now obvious notion that the increased flexibility we gain through our increasingly energy-hungry lifestyles results in a decrease in the environment' | ||
+ | Another implication of the above definition is that a system that is running at or near the limits of its operating variables cannot be flexible. | ||
+ | A project team that is putting in 18 hour workdays in order to finish a project on time. | ||
+ | A car that's being driven at top speed. | ||
+ | A family living beyond their means. | ||
+ | All these systems are operating at or near their limits, they have little or no spare capacity to accommodate change. | ||
+ | A third implication of the definition follows from the preceding one: the key variables of a flexible system should lie in the mid-range of their upper and lower limits. In terms of above examples: | ||
+ | The project team should be putting in normal hours. | ||
+ | The car should be driven at or below the posted road speed limits | ||
+ | The family should be living within its income, with a reasonable amount to spare. | ||
+ | Of course, the whole point of ensuring that systems operate in their comfort zone is that they can be revved up if the need arises. Such revving up, however, | ||
+ | Flexibility in the workplace | ||
+ | As mentioned in the introduction, | ||
+ | The term flexibility is often used to describe this new situation: Jobs are flexible in the sense that they are unstable and uncertain, few employees hold the same jobs for many years, the content of jobs can be changed, and the boundaries between work and leisure are poorly defined. | ||
+ | This trend is aided by recent developments in technology that enable employees to be perpetually on call. This is often sold as a work from home initiative but usually ends up being much more. Eriksen has this to say about home offices: | ||
+ | One recent innovation typically associated with flexibility is the home office. In Scandinavia (and some other prosperous, technologically optimistic regions), many companies equipped some of their employees with home computers with online access to the company network in the early 1990s, in order to enhance their flexibility. This was intended to enable employees to work from home part of the time, thereby making the era when office workers were chained to the office desk all day obsolete. | ||
+ | In the early days, there were widespread worries among employers to the effect that a main outcome of this new flexibility would consist in a reduction of productivity. Since there was no legitimate way of checking how the staff actually spent their time out of the office, it was often suspected that they worked less from home than they were supposed to. If this were in fact the case, working from home would have led to a real increase in the flexibility of time budgeting. However, work researchers eventually came up with a different picture. By the late 1990s, hardly anybody spoke of the home office as a convenient way of escaping from work; rather, the concern among unionists as well as researchers was now that increasing numbers of employees were at pains to distinguish between working hours and leisure time, and were suffering symptoms of burnout and depression. The home office made it difficult to distinguish between contexts that were formerly mutually exclusive because of differ... < | ||
+ | It is interesting to see this development in the light of Bateson' | ||
+ | There seems to be a classic Batesonian flexibility trade-off associated with the new information technologies: | ||
+ | In short, it appears that flexibility for the organization necessarily implies a loss of flexibility for the individual. | ||
+ | Conclusion | ||
+ | Flexibility is in the eye of the beholder: an action to increase organisational flexibility by, say, redeploying employees would likely be seen by those affected as a move that constrains their (individual) flexibility. | ||
+ | > getTransformations() | ||
+ | [1] " | ||
+ | > #create the toSpace content transformer | ||
+ | > toSpace <- content_transformer(function(x, | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > #Remove punctuation ? replace punctuation marks with " " | ||
+ | > docs <- tm_map(docs, | ||
+ | > | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > #Transform to lower case (need to wrap in content_transformer) | ||
+ | > docs <- tm_map(docs, | ||
+ | > #Strip digits (std transformation, | ||
+ | > docs <- tm_map(docs, | ||
+ | > #remove stopwords using the standard list in tm | ||
+ | > docs <- tm_map(docs, | ||
+ | > #Strip whitespace (cosmetic? | ||
+ | > docs <- tm_map(docs, | ||
+ | > writeLines(as.character(docs[[30]])) | ||
+ | understanding flexibility ?close view organizational platitude introduction flexibility one buzzwords keeps coming organizational communiques discussions people continually asked display flexibility without ever told term means flexible workplaces flexible attitudes flexible jobs ?word flexible meaning depends context used words used way become platitudes ?empty words make lot noise post analyse platitude flexibility used organisations discussion based paper thomas eriksen entitled mind gap flexibility epistemology rhetoric new work background ?bit organizational platitudes one things struck moved academia industry difference way words phrases used two domains academics one carefully define terms one uses particularly one coining new term whereas business doesnt seem matter words can mean whatever one wants mean ok exaggeration much indeed paul culmsee discuss first chapter heretics guide best practices many terms commonly bandied organizations platitudes understood differently differe... < | ||
+ | > #load library | ||
+ | > library(SnowballC) | ||
+ | > | ||
+ | > #Stem document | ||
+ | > docs <- tm_map(docs, | ||
+ | > writeLines(as.character(docs[[30]])) | ||
+ | understand flexibl ?close view organiz platitud introduct flexibl one buzzword keep come organiz communiqu discuss peopl continu ask display flexibl without ever told term mean flexibl workplac flexibl attitud flexibl job ?word flexibl mean depend context use word use way becom platitud ?empti word make lot nois post analys platitud flexibl use organis discuss base paper thoma eriksen entitl mind gap flexibl epistemolog rhetor new work background ?bit organiz platitud one thing struck move academia industri differ way word phrase use two domain academ one care defin term one use particular one coin new term wherea busi doesnt seem matter word can mean whatev one want mean ok exagger much inde paul culmse discuss first chapter heret guid best practic mani term common bandi organ platitud understood differ differ peopl good exampl platitud word govern one manag may see govern larg oversight control wherea anoth might interpret provid guidanc vari interpret can result major differ way two... < | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > dtm <- DocumentTermMatrix(docs) | ||
+ | > dtm | ||
+ | << | ||
+ | Non-/sparse entries: 13979/ | ||
+ | Sparsity | ||
+ | Maximal term length: 48 | ||
+ | Weighting | ||
+ | > inspect(dtm[1: | ||
+ | << | ||
+ | Non-/sparse entries: 0/12 | ||
+ | Sparsity | ||
+ | Maximal term length: 7 | ||
+ | Weighting | ||
+ | Sample | ||
+ | Terms | ||
+ | Docs | ||
+ | BeyondEntitiesAndRelationships.txt | ||
+ | bigdata.txt | ||
+ | > freq <- colSums(as.matrix(dtm)) | ||
+ | > #length should be total number of terms | ||
+ | > length(freq) | ||
+ | [1] 3902 | ||
+ | > #create sort order (descending) | ||
+ | > ord <- order(freq, | ||
+ | > #inspect most frequently occurring terms | ||
+ | > freq[head(ord)] | ||
+ | | ||
+ | | ||
+ | > | ||
+ | > #inspect least frequently occurring terms | ||
+ | > freq[tail(ord)] | ||
+ | therebi timeorgan | ||
+ | 1 | ||
+ | > dtmr < | ||
+ | > dtmr | ||
+ | << | ||
+ | Non-/sparse entries: 10071/ | ||
+ | Sparsity | ||
+ | Maximal term length: 15 | ||
+ | Weighting | ||
+ | > freqr <- colSums(as.matrix(dtmr)) | ||
+ | > #length should be total number of terms | ||
+ | > length(freqr) | ||
+ | [1] 1294 | ||
+ | > | ||
+ | > #create sort order (asc) | ||
+ | > ordr <- order(freqr, | ||
+ | > | ||
+ | > #inspect most frequently occurring terms | ||
+ | > freqr[head(ordr)] | ||
+ | organ | ||
+ | 275 | ||
+ | > | ||
+ | > #inspect least frequently occurring terms | ||
+ | > freqr[tail(ordr)] | ||
+ | hmmm struck multin | ||
+ | | ||
+ | > findFreqTerms(dtmr, | ||
+ | [1] " | ||
+ | [12] " | ||
+ | [23] " | ||
+ | [34] " | ||
+ | > findAssocs(dtmr, | ||
+ | $project | ||
+ | | ||
+ | 0.82 | ||
+ | |||
+ | > findAssocs(dtmr, | ||
+ | $enterpris | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | > findAssocs(dtmr, | ||
+ | $system | ||
+ | design | ||
+ | 0.78 | ||
+ | | ||
+ | 0.61 | ||
+ | |||
+ | > wf=data.frame(term=names(freqr), | ||
+ | > library(ggplot2) | ||
+ | > p <- ggplot(subset(wf, | ||
+ | > p <- p + geom_bar(stat=" | ||
+ | > p <- p + theme(axis.text.x=element_text(angle=45, | ||
+ | > p | ||
+ | > # | ||
+ | > library(wordcloud) | ||
+ | > #setting the same seed each time ensures consistent look across clouds | ||
+ | > set.seed(42) | ||
+ | > #limit words by specifying min frequency | ||
+ | > wordcloud(names(freqr), | ||
+ | > #…add color | ||
+ | > wordcloud(names(freqr), | ||
+ | > dtmr | ||
+ | << | ||
+ | Non-/sparse entries: 10071/ | ||
+ | Sparsity | ||
+ | Maximal term length: 15 | ||
+ | Weighting | ||
+ | > inspect(dtmr[1: | ||
+ | Error in x$nrow : $ operator is invalid for atomic vectors | ||
+ | > inspect(dtmr) | ||
+ | << | ||
+ | Non-/sparse entries: 10071/ | ||
+ | Sparsity | ||
+ | Maximal term length: 15 | ||
+ | Weighting | ||
+ | Sample | ||
+ | | ||
+ | Docs approach differ exampl manag organ problem project system will work | ||
+ | BeyondEntitiesAndRelationships.txt | ||
+ | ConditionsOverCauses.txt | ||
+ | EmergentDesignInEnterpriseIT.txt | ||
+ | FromInformationToKnowledge.txt | ||
+ | MakingSenseOfOrganizationalChange.txt | ||
+ | MakingSenseOfSensemaking.txt | ||
+ | RoutinesAndReality.txt | ||
+ | SixHeresiesForBI.txt | ||
+ | TheEssenceOfEntrepreneurship.txt | ||
+ | ThreeTypesOfUncertainty.txt | ||
+ | > dtm | ||
+ | << | ||
+ | Non-/sparse entries: 13979/ | ||
+ | Sparsity | ||
+ | Maximal term length: 48 | ||
+ | Weighting | ||
+ | > dtmr | ||
+ | << | ||
+ | Non-/sparse entries: 10071/ | ||
+ | Sparsity | ||
+ | Maximal term length: 15 | ||
+ | Weighting | ||
+ | > inspect(dtm) | ||
+ | << | ||
+ | Non-/sparse entries: 13979/ | ||
+ | Sparsity | ||
+ | Maximal term length: 48 | ||
+ | Weighting | ||
+ | Sample | ||
+ | | ||
+ | Docs can manag one organ problem project system use way work | ||
+ | BeyondEntitiesAndRelationships.txt | ||
+ | ConditionsOverCauses.txt | ||
+ | EmergentDesignInEnterpriseIT.txt | ||
+ | FromInformationToKnowledge.txt | ||
+ | MakingSenseOfOrganizationalChange.txt | ||
+ | MakingSenseOfSensemaking.txt | ||
+ | RoutinesAndReality.txt | ||
+ | SixHeresiesForBI.txt | ||
+ | TheEssenceOfEntrepreneurship.txt | ||
+ | ThreeTypesOfUncertainty.txt | ||
+ | > Sys.setlocale(category = " | ||
+ | [1] " | ||
+ | > library(tm) | ||
+ | > #Create Corpus | ||
+ | > docs <- Corpus(DirSource(" | ||
+ | > docs | ||
+ | << | ||
+ | Metadata: | ||
+ | Content: | ||
+ | > #inspect a particular document | ||
+ | > writeLines(as.character(docs[[30]])) | ||
+ | Understanding " | ||
+ | |||
+ | Introduction | ||
+ | Flexibility is one of those buzzwords that keeps coming up in organizational communiques and discussions. People are continually asked to display flexibility, | ||
+ | When words are used in this way they become platitudes ?empty words that make a lot of noise. In this post, I analyse the platitude, flexibility, | ||
+ | Background ?a bit about organizational platitudes | ||
+ | One of the things that struck me when I moved from academia to industry is the difference in the way words or phrases are used in the two domains. In academics one has to carefully define the terms one uses (particularly if one is coining a new term) whereas in business it doesn' | ||
+ | A good example of a platitude is the word governance. One manager may see governance as being largely about oversight and control whereas another might interpret it as being about providing guidance. | ||
+ | Flexibility ?the conventional view | ||
+ | A good place to start our discussion of flexibility is with the dictionary. The online Oxford Dictionary defines at as: | ||
+ | Flexibility (noun): | ||
+ | the ability to be easily modified | ||
+ | | ||
+ | The term is widely used in both these senses in organizational settings. For example, people speak of flexible designs (i.e. designs that can be easily modified) or flexible people (referring to those who are willing to change or compromise). However, | ||
+ | Jobs are flexible in the sense that they are unstable and uncertain, few employees hold the same jobs for many years, the content of jobs can be changed almost overnight, and the boundaries between work and leisure are negotiable and chronically fuzzy. | ||
+ | Indeed, such " | ||
+ | Understanding flexibility | ||
+ | Consider the following definition of flexibility proposed by Gregory Bateson: | ||
+ | " | ||
+ | This deceptively simple statement is a good place to start understanding what flexibility really means for projects, organisations 꿢nd even software systems. | ||
+ | As Eriksen tells us, Bateson proposed this definition in the context of ecology. In particular, Bateson had in mind the now obvious notion that the increased flexibility we gain through our increasingly energy-hungry lifestyles results in a decrease in the environment' | ||
+ | Another implication of the above definition is that a system that is running at or near the limits of its operating variables cannot be flexible. | ||
+ | A project team that is putting in 18 hour workdays in order to finish a project on time. | ||
+ | A car that's being driven at top speed. | ||
+ | A family living beyond their means. | ||
+ | All these systems are operating at or near their limits, they have little or no spare capacity to accommodate change. | ||
+ | A third implication of the definition follows from the preceding one: the key variables of a flexible system should lie in the mid-range of their upper and lower limits. In terms of above examples: | ||
+ | The project team should be putting in normal hours. | ||
+ | The car should be driven at or below the posted road speed limits | ||
+ | The family should be living within its income, with a reasonable amount to spare. | ||
+ | Of course, the whole point of ensuring that systems operate in their comfort zone is that they can be revved up if the need arises. Such revving up, however, | ||
+ | Flexibility in the workplace | ||
+ | As mentioned in the introduction, | ||
+ | The term flexibility is often used to describe this new situation: Jobs are flexible in the sense that they are unstable and uncertain, few employees hold the same jobs for many years, the content of jobs can be changed, and the boundaries between work and leisure are poorly defined. | ||
+ | This trend is aided by recent developments in technology that enable employees to be perpetually on call. This is often sold as a work from home initiative but usually ends up being much more. Eriksen has this to say about home offices: | ||
+ | One recent innovation typically associated with flexibility is the home office. In Scandinavia (and some other prosperous, technologically optimistic regions), many companies equipped some of their employees with home computers with online access to the company network in the early 1990s, in order to enhance their flexibility. This was intended to enable employees to work from home part of the time, thereby making the era when office workers were chained to the office desk all day obsolete. | ||
+ | In the early days, there were widespread worries among employers to the effect that a main outcome of this new flexibility would consist in a reduction of productivity. Since there was no legitimate way of checking how the staff actually spent their time out of the office, it was often suspected that they worked less from home than they were supposed to. If this were in fact the case, working from home would have led to a real increase in the flexibility of time budgeting. However, work researchers eventually came up with a different picture. By the late 1990s, hardly anybody spoke of the home office as a convenient way of escaping from work; rather, the concern among unionists as well as researchers was now that increasing numbers of employees were at pains to distinguish between working hours and leisure time, and were suffering symptoms of burnout and depression. The home office made it difficult to distinguish between contexts that were formerly mutually exclusive because of differ... < | ||
+ | It is interesting to see this development in the light of Bateson' | ||
+ | There seems to be a classic Batesonian flexibility trade-off associated with the new information technologies: | ||
+ | In short, it appears that flexibility for the organization necessarily implies a loss of flexibility for the individual. | ||
+ | Conclusion | ||
+ | Flexibility is in the eye of the beholder: an action to increase organisational flexibility by, say, redeploying employees would likely be seen by those affected as a move that constrains their (individual) flexibility. | ||
+ | > getTransformations() | ||
+ | [1] " | ||
+ | [5] " | ||
+ | > #create the toSpace content transformer | ||
+ | > toSpace <- content_transformer(function(x, | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > #Remove punctuation ? replace punctuation marks with " " | ||
+ | > docs <- tm_map(docs, | ||
+ | > | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > #Transform to lower case (need to wrap in content_transformer) | ||
+ | > docs <- tm_map(docs, | ||
+ | > #Strip digits (std transformation, | ||
+ | > docs <- tm_map(docs, | ||
+ | > #remove stopwords using the standard list in tm | ||
+ | > docs <- tm_map(docs, | ||
+ | > #Strip whitespace (cosmetic? | ||
+ | > docs <- tm_map(docs, | ||
+ | > writeLines(as.character(docs[[30]])) | ||
+ | understanding flexibility ?close view organizational platitude introduction flexibility one buzzwords keeps coming organizational communiques discussions people continually asked display flexibility without ever told term means flexible workplaces flexible attitudes flexible jobs ?word flexible meaning depends context used words used way become platitudes ?empty words make lot noise post analyse platitude flexibility used organisations discussion based paper thomas eriksen entitled mind gap flexibility epistemology rhetoric new work background ?bit organizational platitudes one things struck moved academia industry difference way words phrases used two domains academics one carefully define terms one uses particularly one coining new term whereas business doesnt seem matter words can mean whatever one wants mean ok exaggeration much indeed paul culmsee discuss first chapter heretics guide best practices many terms commonly bandied organizations platitudes understood differently differe... < | ||
+ | > #load library | ||
+ | > library(SnowballC) | ||
+ | > | ||
+ | > #Stem document | ||
+ | > docs <- tm_map(docs, | ||
+ | > writeLines(as.character(docs[[30]])) | ||
+ | understand flexibl ?close view organiz platitud introduct flexibl one buzzword keep come organiz communiqu discuss peopl continu ask display flexibl without ever told term mean flexibl workplac flexibl attitud flexibl job ?word flexibl mean depend context use word use way becom platitud ?empti word make lot nois post analys platitud flexibl use organis discuss base paper thoma eriksen entitl mind gap flexibl epistemolog rhetor new work background ?bit organiz platitud one thing struck move academia industri differ way word phrase use two domain academ one care defin term one use particular one coin new term wherea busi doesnt seem matter word can mean whatev one want mean ok exagger much inde paul culmse discuss first chapter heret guid best practic mani term common bandi organ platitud understood differ differ peopl good exampl platitud word govern one manag may see govern larg oversight control wherea anoth might interpret provid guidanc vari interpret can result major differ way two... < | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > dtm <- DocumentTermMatrix(docs) | ||
+ | > dtm | ||
+ | << | ||
+ | Non-/sparse entries: 13979/ | ||
+ | Sparsity | ||
+ | Maximal term length: 48 | ||
+ | Weighting | ||
+ | > inspect(dtm[1: | ||
+ | << | ||
+ | Non-/sparse entries: 0/12 | ||
+ | Sparsity | ||
+ | Maximal term length: 7 | ||
+ | Weighting | ||
+ | Sample | ||
+ | Terms | ||
+ | Docs | ||
+ | BeyondEntitiesAndRelationships.txt | ||
+ | bigdata.txt | ||
+ | > freq <- colSums(as.matrix(dtm)) | ||
+ | > #length should be total number of terms | ||
+ | > length(freq) | ||
+ | [1] 3902 | ||
+ | > #create sort order (descending) | ||
+ | > ord <- order(freq, | ||
+ | > #inspect most frequently occurring terms | ||
+ | > freq[head(ord)] | ||
+ | | ||
+ | | ||
+ | > | ||
+ | > #inspect least frequently occurring terms | ||
+ | > freq[tail(ord)] | ||
+ | therebi timeorgan | ||
+ | 1 | ||
+ | > dtmr < | ||
+ | > dtmr | ||
+ | << | ||
+ | Non-/sparse entries: 10071/ | ||
+ | Sparsity | ||
+ | Maximal term length: 15 | ||
+ | Weighting | ||
+ | > freqr <- colSums(as.matrix(dtmr)) | ||
+ | > #length should be total number of terms | ||
+ | > length(freqr) | ||
+ | [1] 1294 | ||
+ | > | ||
+ | > #create sort order (asc) | ||
+ | > ordr <- order(freqr, | ||
+ | > | ||
+ | > #inspect most frequently occurring terms | ||
+ | > freqr[head(ordr)] | ||
+ | organ | ||
+ | 275 | ||
+ | > | ||
+ | > #inspect least frequently occurring terms | ||
+ | > freqr[tail(ordr)] | ||
+ | hmmm struck multin | ||
+ | | ||
+ | > findFreqTerms(dtmr, | ||
+ | [1] " | ||
+ | [7] " | ||
+ | [13] " | ||
+ | [19] " | ||
+ | [25] " | ||
+ | [31] " | ||
+ | [37] " | ||
+ | [43] " | ||
+ | > findAssocs(dtmr, | ||
+ | $project | ||
+ | | ||
+ | 0.82 | ||
+ | |||
+ | > findAssocs(dtmr, | ||
+ | $enterpris | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | 0.62 | ||
+ | |||
+ | > findAssocs(dtmr, | ||
+ | $system | ||
+ | design | ||
+ | 0.78 | ||
+ | specif | ||
+ | 0.66 | ||
+ | cognit | ||
+ | 0.60 | ||
+ | |||
+ | > wf=data.frame(term=names(freqr), | ||
+ | > library(ggplot2) | ||
+ | > p <- ggplot(subset(wf, | ||
+ | > p <- p + geom_bar(stat=" | ||
+ | > p <- p + theme(axis.text.x=element_text(angle=45, | ||
+ | > p | ||
+ | > # | ||
+ | > library(wordcloud) | ||
+ | > #setting the same seed each time ensures consistent look across clouds | ||
+ | > set.seed(42) | ||
+ | > #limit words by specifying min frequency | ||
+ | > wordcloud(names(freqr), | ||
+ | > #…add color | ||
+ | > wordcloud(names(freqr), | ||
+ | </ |
c/itamc/2017.1510551199.txt.gz · Last modified: 2017/11/13 14:03 by hkimscil