c:itamc:2018
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
c:itamc:2018 [2018/11/06 17:53] – [e.g. 1 with output] hkimscil | c:itamc:2018 [2019/11/12 15:43] (current) – hkimscil | ||
---|---|---|---|
Line 8: | Line 8: | ||
create Textmining directory in R working directory. | create Textmining directory in R working directory. | ||
Unzip the zip file. | Unzip the zip file. | ||
- | ====== e.g. 1 ====== | + | ====== |
< | < | ||
NeededPackages <- c(" | NeededPackages <- c(" | ||
Line 168: | Line 168: | ||
set.seed(42) | set.seed(42) | ||
#limit words by specifying min frequency | #limit words by specifying min frequency | ||
- | wordcloud(names(freqr), | + | wordcloud(names(freqr), |
</ | </ | ||
< | < | ||
#…add color | #…add color | ||
- | wordcloud(names(freqr), | + | wordcloud(names(freqr), |
</ | </ | ||
+ | ====== output ====== | ||
+ | < | ||
+ | 필요한 패키지를 로딩중입니다: | ||
+ | 다음의 패키지를 부착합니다: | ||
+ | |||
+ | The following object is masked from ‘package: | ||
+ | |||
+ | annotate | ||
+ | |||
+ | Warning messages: | ||
+ | 1: 패키지 ‘tm’는 R 버전 3.4.4에서 작성되었습니다 | ||
+ | 2: 패키지 ‘NLP’는 R 버전 3.4.4에서 작성되었습니다 | ||
+ | > #Create Corpus | ||
+ | > docs <- Corpus(DirSource(" | ||
+ | > docs | ||
+ | << | ||
+ | Metadata: | ||
+ | Content: | ||
+ | > #inspect a particular document | ||
+ | > writeLines(as.character(docs[[30]])) | ||
+ | TOGAF or not TOGAF but is that the question? | ||
+ | |||
+ | " | ||
+ | |||
+ | " | ||
+ | |||
+ | I spent much of last week attending a class on the TOGAF Enterprise Architecture (EA) framework. | ||
+ | |||
+ | One of the things about that struck me about TOGAF is the way in which the components of the framework hang together to make a coherent whole (see the introductory chapter of the framework for an overview). To be sure, there is a lot of detail within those components, but there is a certain abstract elegance <96> dare I say, beauty <96> to the framework. | ||
+ | |||
+ | That said TOGAF is (almost) entirely silent on the following question which I addressed in a post late last year: | ||
+ | |||
+ | Why is Enterprise Architecture so hard to get right? | ||
+ | |||
+ | Many answers have been offered. Here are some, extracted from articles published by IT vendors and consultancies: | ||
+ | |||
+ | Lack of sponsorship | ||
+ | Not engaging the business | ||
+ | Inadequate communication | ||
+ | Insensitivity to culture / policing mentality | ||
+ | Clinging to a particular tool or framework | ||
+ | Building an ivory tower | ||
+ | Wrong choice of architect | ||
+ | (Note: the above points are taken from this article and this one) | ||
+ | |||
+ | It is interesting that the first four issues listed are related to the fact that different stakeholders in an organization have vastly different perspectives on what an enterprise architecture initiative should achieve. | ||
+ | |||
+ | Interestingly, | ||
+ | |||
+ | TOGAF offers enterprise architects a wealth of tools to manage technical complexity. These need to be complemented by a suite of techniques to reconcile worldviews of different stakeholder groups. | ||
+ | |||
+ | < | ||
+ | |||
+ | Apart from social complexity, there is the problem of context <96> the circumstances that shape the unique culture and features of an organization. | ||
+ | |||
+ | Some may argue that the framework acknowledges this and encourages, even exhorts, people to tailor the framework to their needs. Sure, the word " | ||
+ | |||
+ | On a related note, the TOGAF framework acknowledges that there is a hierarchy of architectures ranging from the general (foundation) to the specific (organization). However despite the acknowledgement of diversity, | ||
+ | |||
+ | I have often heard arguments along the lines of "80% of what we do follows a standard process, so it should be easy for us to standardize on a framework." | ||
+ | |||
+ | To sum up, frameworks like TOGAF are abstractions based on an ideal organization; | ||
+ | > getTransformations() | ||
+ | [1] " | ||
+ | [4] " | ||
+ | > #create the toSpace content transformer | ||
+ | > toSpace <- content_transformer(function(x, | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > #Remove punctuation ? replace punctuation marks with " " | ||
+ | > docs <- tm_map(docs, | ||
+ | > | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > #Transform to lower case (need to wrap in content_transformer) | ||
+ | > docs <- tm_map(docs, | ||
+ | > #Strip digits (std transformation, | ||
+ | > docs <- tm_map(docs, | ||
+ | > #remove stopwords using the standard list in tm | ||
+ | > docs <- tm_map(docs, | ||
+ | > #Strip whitespace (cosmetic?) | ||
+ | > docs <- tm_map(docs, | ||
+ | > writeLines(as.character(docs[[30]])) | ||
+ | togaf togaf question holy grail effective collaboration creating shared understanding precursor shared commitment jeff conklin without context words actions meaning gregory bateson spent much last week attending class togaf enterprise architecture ea framework prior experience frameworks pmbok itil taught much depends instructor good one can make material come alive whereas good one can make experience akin watching grass grow neednt worried instructor superb classmates experienced professionals architects livened proceedings comments discussions class outside thoroughly enjoyable educative experience something say many professional courses attended one things struck togaf way components framework hang together make coherent whole see introductory chapter framework overview sure lot detail within components certain abstract elegance dare say beauty framework said togaf almost entirely silent following question addressed post late last year enterprise architecture hard get right many an... < | ||
+ | > #load library | ||
+ | > library(SnowballC) | ||
+ | Warning message: | ||
+ | 패키지 ‘SnowballC’는 R 버전 3.4.4에서 작성되었습니다 | ||
+ | > | ||
+ | > #Stem document | ||
+ | > docs <- tm_map(docs, | ||
+ | > writeLines(as.character(docs[[30]])) | ||
+ | togaf togaf question holi grail effect collabor creat share understand precursor share commit jeff conklin without context word action mean gregori bateson spent much last week attend class togaf enterpris architectur ea framework prior experi framework pmbok itil taught much depend instructor good one can make materi come aliv wherea good one can make experi akin watch grass grow neednt worri instructor superb classmat experienc profession architect liven proceed comment discuss class outsid thorough enjoy educ experi someth say mani profession cours attend one thing struck togaf way compon framework hang togeth make coher whole see introductori chapter framework overview sure lot detail within compon certain abstract eleg dare say beauti framework said togaf almost entir silent follow question address post late last year enterpris architectur hard get right mani answer offer extract articl publish vendor consult lack sponsorship engag busi inadequ communic insensit cultur polic menta... < | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > docs <- tm_map(docs, | ||
+ | > dtm <- DocumentTermMatrix(docs) | ||
+ | > dtm | ||
+ | << | ||
+ | Non-/sparse entries: 13977/ | ||
+ | Sparsity | ||
+ | Maximal term length: 54 | ||
+ | Weighting | ||
+ | > inspect(dtm[1: | ||
+ | << | ||
+ | Non-/sparse entries: 0/12 | ||
+ | Sparsity | ||
+ | Maximal term length: 12 | ||
+ | Weighting | ||
+ | Sample | ||
+ | Terms | ||
+ | Docs decid decis defineeffici | ||
+ | BeyondEntitiesAndRelationships.txt | ||
+ | bigdata.txt | ||
+ | Terms | ||
+ | Docs degre demand devoid | ||
+ | BeyondEntitiesAndRelationships.txt | ||
+ | bigdata.txt | ||
+ | > inspect(dtm) | ||
+ | << | ||
+ | Non-/sparse entries: 13977/ | ||
+ | Sparsity | ||
+ | Maximal term length: 54 | ||
+ | Weighting | ||
+ | Sample | ||
+ | Terms | ||
+ | Docs can manag one organ | ||
+ | BeyondEntitiesAndRelationships.txt | ||
+ | ConditionsOverCauses.txt | ||
+ | EmergentDesignInEnterpriseIT.txt | ||
+ | FromInformationToKnowledge.txt | ||
+ | MakingSenseOfOrganizationalChange.txt | ||
+ | MakingSenseOfSensemaking.txt | ||
+ | RoutinesAndReality.txt | ||
+ | SixHeresiesForBI.txt | ||
+ | TheEssenceOfEntrepreneurship.txt | ||
+ | ThreeTypesOfUncertainty.txt | ||
+ | Terms | ||
+ | Docs problem project system | ||
+ | BeyondEntitiesAndRelationships.txt | ||
+ | ConditionsOverCauses.txt | ||
+ | EmergentDesignInEnterpriseIT.txt | ||
+ | FromInformationToKnowledge.txt | ||
+ | MakingSenseOfOrganizationalChange.txt | ||
+ | MakingSenseOfSensemaking.txt | ||
+ | RoutinesAndReality.txt | ||
+ | SixHeresiesForBI.txt | ||
+ | TheEssenceOfEntrepreneurship.txt | ||
+ | ThreeTypesOfUncertainty.txt | ||
+ | Terms | ||
+ | Docs use way work | ||
+ | BeyondEntitiesAndRelationships.txt | ||
+ | ConditionsOverCauses.txt | ||
+ | EmergentDesignInEnterpriseIT.txt | ||
+ | FromInformationToKnowledge.txt | ||
+ | MakingSenseOfOrganizationalChange.txt | ||
+ | MakingSenseOfSensemaking.txt | ||
+ | RoutinesAndReality.txt | ||
+ | SixHeresiesForBI.txt | ||
+ | TheEssenceOfEntrepreneurship.txt | ||
+ | ThreeTypesOfUncertainty.txt | ||
+ | > freq <- colSums(as.matrix(dtm)) | ||
+ | > #length should be total number of terms | ||
+ | > length(freq) | ||
+ | [1] 3892 | ||
+ | > #create sort order (descending) | ||
+ | > ord <- order(freq, decreasing=TRUE) | ||
+ | > #inspect most frequently occurring terms | ||
+ | > freq[head(ord)] | ||
+ | | ||
+ | | ||
+ | > | ||
+ | > #inspect least frequently occurring terms | ||
+ | > freq[tail(ord)] | ||
+ | therebi timeorgan | ||
+ | 1 | ||
+ | > # word length 4 or more | ||
+ | > dtmr < | ||
+ | > dtmr | ||
+ | << | ||
+ | Non-/sparse entries: 10082/30063 | ||
+ | Sparsity | ||
+ | Maximal term length: 15 | ||
+ | Weighting | ||
+ | > inspect(dtmr) | ||
+ | << | ||
+ | Non-/sparse entries: 10082/30063 | ||
+ | Sparsity | ||
+ | Maximal term length: 15 | ||
+ | Weighting | ||
+ | Sample | ||
+ | Terms | ||
+ | Docs differ exampl manag | ||
+ | BeyondEntitiesAndRelationships.txt | ||
+ | ConditionsOverCauses.txt | ||
+ | EmergentDesignInEnterpriseIT.txt | ||
+ | FromInformationToKnowledge.txt | ||
+ | MakingSenseOfOrganizationalChange.txt | ||
+ | MakingSenseOfSensemaking.txt | ||
+ | RoutinesAndReality.txt | ||
+ | SixHeresiesForBI.txt | ||
+ | TheEssenceOfEntrepreneurship.txt | ||
+ | ThreeTypesOfUncertainty.txt | ||
+ | Terms | ||
+ | Docs organ problem project | ||
+ | BeyondEntitiesAndRelationships.txt | ||
+ | ConditionsOverCauses.txt | ||
+ | EmergentDesignInEnterpriseIT.txt | ||
+ | FromInformationToKnowledge.txt | ||
+ | MakingSenseOfOrganizationalChange.txt | ||
+ | MakingSenseOfSensemaking.txt | ||
+ | RoutinesAndReality.txt | ||
+ | SixHeresiesForBI.txt | ||
+ | TheEssenceOfEntrepreneurship.txt | ||
+ | ThreeTypesOfUncertainty.txt | ||
+ | Terms | ||
+ | Docs question system will | ||
+ | BeyondEntitiesAndRelationships.txt | ||
+ | ConditionsOverCauses.txt | ||
+ | EmergentDesignInEnterpriseIT.txt | ||
+ | FromInformationToKnowledge.txt | ||
+ | MakingSenseOfOrganizationalChange.txt | ||
+ | MakingSenseOfSensemaking.txt | ||
+ | RoutinesAndReality.txt | ||
+ | SixHeresiesForBI.txt | ||
+ | TheEssenceOfEntrepreneurship.txt | ||
+ | ThreeTypesOfUncertainty.txt | ||
+ | Terms | ||
+ | Docs work | ||
+ | BeyondEntitiesAndRelationships.txt | ||
+ | ConditionsOverCauses.txt | ||
+ | EmergentDesignInEnterpriseIT.txt | ||
+ | FromInformationToKnowledge.txt | ||
+ | MakingSenseOfOrganizationalChange.txt | ||
+ | MakingSenseOfSensemaking.txt | ||
+ | RoutinesAndReality.txt | ||
+ | SixHeresiesForBI.txt | ||
+ | TheEssenceOfEntrepreneurship.txt | ||
+ | ThreeTypesOfUncertainty.txt | ||
+ | > freqr <- colSums(as.matrix(dtmr)) | ||
+ | > #length should be total number of terms | ||
+ | > length(freqr) | ||
+ | [1] 1295 | ||
+ | > | ||
+ | > #create sort order (asc) | ||
+ | > ordr <- order(freqr, | ||
+ | > | ||
+ | > #inspect most frequently occurring terms | ||
+ | > freqr[head(ordr)] | ||
+ | organ | ||
+ | 276 | ||
+ | > | ||
+ | > #inspect least frequently occurring terms | ||
+ | > freqr[tail(ordr)] | ||
+ | hmmm struck multin | ||
+ | | ||
+ | > findFreqTerms(dtmr, | ||
+ | [1] " | ||
+ | [5] " | ||
+ | [9] " | ||
+ | [13] " | ||
+ | [17] " | ||
+ | [21] " | ||
+ | [25] " | ||
+ | [29] " | ||
+ | [33] " | ||
+ | [37] " | ||
+ | [41] " | ||
+ | > findAssocs(dtmr, | ||
+ | $project | ||
+ | inher | ||
+ | | ||
+ | |||
+ | > findAssocs(dtmr, | ||
+ | $enterpris | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | architectur | ||
+ | | ||
+ | |||
+ | > findAssocs(dtmr, | ||
+ | $system | ||
+ | design | ||
+ | 0.78 | ||
+ | intend | ||
+ | 0.68 | ||
+ | phone frequent | ||
+ | 0.64 | ||
+ | |||
+ | > wf=data.frame(term=names(freqr), | ||
+ | > library(ggplot2) | ||
+ | > p <- ggplot(subset(wf, | ||
+ | > p <- p + geom_bar(stat=" | ||
+ | > p <- p + theme(axis.text.x=element_text(angle=45, | ||
+ | > p | ||
+ | > #wordcloud | ||
+ | > library(wordcloud) | ||
+ | 필요한 패키지를 로딩중입니다: | ||
+ | Warning messages: | ||
+ | 1: 패키지 ‘wordcloud’는 R 버전 3.4.4에서 작성되었습니다 | ||
+ | 2: 패키지 ‘RColorBrewer’는 R 버전 3.4.4에서 작성되었습니다 | ||
+ | > #setting the same seed each time ensures consistent look across clouds | ||
+ | > set.seed(42) | ||
+ | > #limit words by specifying min frequency | ||
+ | > wordcloud(names(freqr), | ||
+ | There were 50 or more warnings (use warnings() to see the first 50) | ||
+ | > #…add color | ||
+ | > wordcloud(names(freqr), | ||
+ | There were 50 or more warnings (use warnings() to see the first 50) | ||
+ | > | ||
+ | </ | ||
+ | |||
+ | {{: | ||
+ | {{: |
c/itamc/2018.1541494411.txt.gz · Last modified: 2018/11/06 17:53 by hkimscil