Ultimate Guide To Understandx Understand
User Manual:
Open the PDF directly: View PDF .
Page Count: 29
Download | |
Open PDF In Browser | View PDF |
Ultimate Guide to Understand & Implement Natural Language Processing (with codes in Python) According to industry estimates, only 21% of the available data is present in structured form. Data is being generated as we speak, as we tweet, as we send messages on Whatsapp and in various other activities. Majority of this data exists in the textual form, which is highly unstructured in nature. Few notorious examples include – tweets / posts on social media, user to user chat conversations, news, blogs and articles, product or services reviews and patient records in the healthcare sector. A few more recent ones includes chatbots and other voice driven bots. Despite having high dimension data, the information present in it is not directly accessible unless it is processed (read and understood) manually or analyzed by an automated system. In order to produce significant and actionable insights from text data, it is important to get acquainted with the techniques and principles of Natural Language Processing (NLP). So, if you plan to create chatbots this year, or you want to use the power of unstructured text, this guide is the right starting point. This guide unearths the concepts of natural language processing, its techniques and implementation. The aim of the article is to teach the concepts of natural language processing and apply it on real data set. Table of Contents 1. Introduction to NLP 2. Text Preprocessing o Noise Removal o Lexicon Normalization Lemmatization Stemming o Object Standardization 3. Text to Features (Feature Engineering on text data) o Syntactical Parsing Dependency Grammar Part of Speech Tagging Entity Parsing Phrase Detection Named Entity Recognition Topic Modelling N-Grams o Statistical features TF – IDF Frequency / Density Features Readability Features o Word Embeddings 4. Important tasks of NLP o Text Classification o Text Matching Levenshtein Distance Phonetic Matching Flexible String Matching o Coreference Resolution o Other Problems 5. Important NLP libraries o 1. Introduction to Natural Language Processing NLP is a branch of data science that consists of systematic processes for analyzing, understanding, and deriving information from the text data in a smart and efficient manner. By utilizing NLP and its components, one can organize the massive chunks of text data, perform numerous automated tasks and solve a wide range of problems such as – automatic summarization, machine translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic segmentation etc. Before moving further, I would like to explain some terms that are used in the article: Tokenization – process of converting a text into tokens Tokens – words or entities present in the text Text object – a sentence or a phrase or a word or an article 2. Text Preprocessing Since, text is the most unstructured form of all the available data, various types of noise are present in it and the data is not readily analyzable without any pre-processing. The entire process of cleaning and standa ardization of o text, makiing it noise--free and re eady for ana alysis is known n as text pre eprocessing. It is pred dominantly comprised c of o three step ps: Noise N Remov val Lexicon Norm malization Object O Stand dardization The follo owing image e shows the architecturre of text pre eprocessing g pipeline. 2.1 No oise Rem moval Any piec ce of text wh hich is not relevant r to the t context of the data and the en nd-output ca an be specified d as the nois se. For exam mple – language stopw words (comm monly used words of a language – is, am, the, of, in etc), URLs U or link ks, social media entities s (mentionss, hashtags)), punctuatio ons and ind dustry specific words. w This s step deals with remov val of all type es of noisy entities pressent in the ttext. A genera al approach h for noise re emoval is to o prepare a dictionary o of noisy enttities, and iterate the text object by to okens (or by b words), eliminating e tthose token ns which arre present in n the noise dic ctionary. Following is the pyth hon code fo or the same purpose. ``` # Sample e code to remove noisy y words from a text noise_list = ["is", "a", "this", "..."] def _remove_noise(input_text): words = input_text.split() noise_free_words = [word for word in words if word not in noise_list] noise_free_text = " ".join(noise_free_words) return noise_free_text _remove_noise("this is a sample text") >>> "sample text" ``` Another approach is to use the regular expressions while dealing with special patterns of noise. Following python code removes a regex pattern from the input text: ``` # Sample code to remove a regex pattern import re def _remove_regex(input_text, regex_pattern): urls = re.finditer(regex_pattern, input_text) for i in urls: input_text = re.sub(i.group().strip(), '', input_text) return input_text regex_pattern = "#[\w]*" _remove_regex("remove this #hashtag from here ", regex_pattern) >>> "remove this ``` from here " 2.2 Lexicon Normalization Another type of textual noise is about the multiple representations exhibited by single word. For example – “play”, “player”, “played”, “plays” and “playing” are the different variations of the word – “play”, Though they mean different but contextually all are similar. The step converts all the disparities of a word into their normalized form (also known as lemma). Normalization is a pivotal step for feature engineering with text as it converts the high dimensional features (N different features) to the low dimensional space (1 feature), which is an ideal ask for any ML model. The most common lexicon normalization practices are : Stemming: Stemming is a rudimentary rule-based process of stripping the suffixes (“ing”, “ly”, “es”, “s” etc) from a word. Lemmatization: Lemmatization, on the other hand, is an organized & step by step procedure of obtaining the root form of the word, it makes use of vocabulary (dictionary importance of words) and morphological analysis (word structure and grammar relations). Below is the sample code that performs lemmatization and stemming using python’s popular library – NLTK. ``` from nltk.stem.wordnet import WordNetLemmatizer lem = WordNetLemmatizer() from nltk.stem.porter import PorterStemmer stem = PorterStemmer() word = "multiplying" lem.lemmatize(word, "v") >> "multiply" stem.stem(word) >> "multipli" ``` 2.3 Object Standardization Text data often contains words or phrases which are not present in any standard lexical dictionaries. These pieces are not recognized by search engines and models. Some of the examples are – acronyms, hashtags with attached words, and colloquial slangs. With the help of regular expressions and manually prepared data dictionaries, this type of noise can be fixed, the code below uses a dictionary lookup method to replace social media slangs from a text. ``` lookup_dict = {'rt':'Retweet', 'dm':'direct message', "awsm" : "awesome", "luv" :"lov e", "..."} def _lookup_words(input_text): words = input_text.split() new_words = [] for word in words: if word.lower() in lookup_dict: word = lookup_dict[word.lower()] new_words.append(word) new_text = " ".join(new_words) return new_text _lookup_words("RT this is a retweeted tweet by Gautam ") >> "Retweet this is a retweeted tweet by Gautam" ``` Apart from three steps discussed so far, other types of text preprocessing includes encoding-decoding noise, grammar checker, and spelling correction etc. 3.Tex xt to Fea atures (Featur ( e Engin neering g on tex xt data) To analy yse a preprrocessed da ata, it needs s to be con nverted into o features. D Depending upon the usag ge, text features can be constrructed usin g assorted d technique es – Syntactical Parsing, Entities / N-grams / word-ba ased featu ures, Statisstical features, and word embeddings. Read on to underrstand these e techniquess in detail. 3.1 Sy yntactic Parsing P Syntactic cal parsing invol ves th he analysis s of words in the sente ence for gra ammar and their arrangem ment in a manner m tha at shows th he relationsships amon ng the word ds. Depend dency Gramma ar and Part of o Speech tags are the important a attributes off text syntacctics. Dependency Trees s – Sentenc ces are composed of so ome words ssewed together. The relations ship among the words in n a sentenc ce is determ mined by the e basic depe endency grammar. Dependency gramma ar is a class s of syntacti c text analyysis that dea als with (labeled)) asymmetriical binary relations r bettween two le exical itemss (words). E Every relation can be re epresented in the form of a triplet (relation, go overnor, dep pendent). For example: considerr the sentence – “Bills on o ports and d immigratio on were sub bmitted by S Senator Brownba ack, Republlican of Kan nsas.” The re elationship a among the w words can b be observed d in the form of a tree re epresentatio on as shown n: The tre ee shows tthat “submittted” is the e root word of this sentence, and is linked by tw wo sub-tree es (subject a and object subtrees). Each subtree is a itself a dependency tree with w relationss such as – (“Bills” < <-> “ports”“proposittion” relation n), (“ports” <-> < “immigrration” > “conjugatio on” relation)). This type of tree, when w parse ed recursive ely in top-d down mann ner gives grammar rellation triplets as a output which w can be b used as features fo or many nlp p problems like entity wise sentimen nt analysis, actor & entity iden ntification, and text classificatio on. The pyython wrapper StanfordCo oreNLP (by Stanford NLP N Group,, only comm mercial lice ense) and N NLTK depende ency gramm mars can be used to gen nerate depe endency tree es. Part of speech tagging – Apart from the grammar relations, every word in a sentence is also associated with a part of speech (pos) tag (nouns, verbs, adjectives, adverbs etc). The pos tags defines the usage and function of a word in the sentence. H ere is a list of all possible pos-tags defined by Pennsylvania university. Following code using NLTK performs pos tagging annotation on input text. (it provides several implementations, the default one is perceptron tagger) ``` from nltk import word_tokenize, pos_tag text = "I am learning Natural Language Processing on Analytics Vidhya" tokens = word_tokenize(text) print pos_tag(tokens) >>> [('I', 'PRP'), ('am', 'VBP'), ('learning', 'VBG'), ('Natural', 'NNP'),('Language' , 'NNP'), ('Processing', 'NNP'), ('on', 'IN'), ('Analytics', 'NNP'),('Vidhya', 'NNP')] ``` Part of Speech tagging is used for many important purposes in NLP: A.Word sense disambiguation: Some language words have multiple meanings according to their usage. For example, in the two sentences below: I. “Please book my flight for Delhi” II. “I am going to read this book in the flight” “Book” is used with different context, however the part of speech tag for both of the cases are different. In sentence I, the word “book” is used as v erb, while in II it is used as no un. (Lesk Algorithm is also used for similar purposes) B.Impro oving word-based fea atures: A le earning mod del could le earn differen nt contexts of a word when used wo ord as the fe eatures, how wever if the part of spe eech tag is linked with tthem, the conte ext is preserved, thus making m stron ng features . For examp ple: Sentence -“book my y flight, I willl read this book” b Tokens – (“book”, 2), 2) (“my”, 1), (“flight”, 1), (“I”, 1), (“w will”, 1), (“rea ad”, 1), (“thiss”, 1) Tokens with w POS – (“book_VB”, 1), (“my_P PRP$”, 1), ((“flight_NN”, ”, 1), (“I_PRP P”, 1), (“will_MD D”, 1), (“read_VB”, 1), (“this_DT”, ( 1), 1 (“book_N NN”, 1) C. Norm malization and a Lemma atization: POS P tags a re the basis of lemma atization pro ocess for conve erting a worrd to its base form (lemma). D.Efficie ent stopwo ord remov val : P OS S tags are also usefful in efficient remova al of stopword ds. For exam mple, there are some tags t which always deffine the low w frequency / less impo ortant words of o a language. For ex xample: (IN – “within”, “upon”, “e except”), (CD – “one”,””two”, “hundred d”), (MD – “m may”, “mu st” s etc) 3.2 En ntity Extrraction (E Entities as a featurres) Entities are defined d as the most importa ant chunks of a sente ence – noun n phrases, verb phrases or both. En ntity Detectiion algorithm ms are gen nerally ense emble mode els of rule b based parsing, dictionary lookups, pos tagging and depende ency parsing g. The appllicability of e entity detection n can be seen s in the e automate ed chat bo ots, contentt analyzers and consumer insights. Topic Mo odelling & Named N Entitty Recognition are the ttwo key entity detection n methods in NLP. A. Nam med Entity Recognittion (NER) The proc cess of dete ecting the na amed entitie es such as p person nam mes, location n names, company y names etc c from the te ext is called as NER. F or example : Sentence – Sergey Brin, the manager of Google G Inc. iis walking in n the streetss of New Yo ork. Named Entities – ( “person” : “Sergey Brin” ), (“org” : “Google Inc.”), (“location” : “New York”) A typical NER model consists of three blocks: Noun phrase identification: This step deals with extracting all the noun phrases from a text using dependency parsing and part of speech tagging. Phrase classification: This is the classification step in which all the extracted noun phrases are classified into respective categories (locations, names etc). Google Maps API provides a good path to disambiguate locations, Then, the open databases from dbpedia, wikipedia can be used to identify person names or company names. Apart from this, one can curate the lookup tables and dictionaries by combining information from different sources. Entity disambiguation: Sometimes it is possible that entities are misclassified, hence creating a validation layer on top of the results is useful. Use of knowledge graphs can be exploited for this purposes. The popular knowledge graphs are – Google Knowledge Graph, IBM Watson and Wikipedia. B. Topic Modeling Topic modeling is a process of automatically identifying the topics present in a text corpus, it derives the hidden patterns among the words in the corpus in an unsupervised manner. Topics are defined as “a repeating pattern of co-occurring terms in a corpus”. A good topic model results in – “health”, “doctor”, “patient”, “hospital” for a topic – Healthcare, and “farm”, “crops”, “wheat” for a topic – “Farming”. Latent Dirichlet Allocation (LDA) is the most popular topic modelling technique, Following is the code to implement topic modeling using LDA in python. ``` doc1 = "Sugar is bad to consume. My sister likes to have sugar, but not my father." doc2 = "My father spends a lot of time driving my sister around to dance practice." doc3 = "Doctors suggest that driving may cause increased stress and blood pressure." doc_complete = [doc1, doc2, doc3] doc_clean = [doc.split() for doc in doc_complete] import gensim from gensim import corpora # Creating the term dictionary of our corpus, where every unique term is assigned an index. dictionary = corpora.Dictionary(doc_clean) # Converting list of documents (corpus) into Document Term Matrix using dictionary pr epared above. doc_term_matrix = [dictionary.doc2bow(doc) for doc in doc_clean] # Creating the object for LDA model using gensim library Lda = gensim.models.ldamodel.LdaModel # Running and Training LDA model on the document term matrix ldamodel = Lda(doc_term_matrix, num_topics=3, id2word = dictionary, passes=50) # Results print(ldamodel.print_topics()) ``` C. N-Grams as Features A combination of N words together are called N-Grams. N grams (N > 1) are generally more informative as compared to words (Unigrams) as features. Also, bigrams (N = 2) are considered as the most important features of all the others. The following code generates bigram of a text. ``` def generate_ngrams(text, n): words = text.split() output = [] for i in range(len(words)‐n+1): output.append(words[i:i+n]) return output >>> generate_ngrams('this is a sample text', 2) # [['this', 'is'], ['is', 'a'], ['a', 'sample'], , ['sample', 'text']] ``` 3.3 Statistical Features Text data can also be quantified directly into numbers using several techniques described in this section: A. Term Frequency – Inverse Document Frequency (TF – IDF) TF-IDF is a weighted model commonly used for information retrieval problems. It aims to convert the text documents into vector models on the basis of occurrence of words in the documents without taking considering the exact ordering. For Example – let say there is a dataset of N text documents, In any document “D”, TF and IDF will be defined as – Term Frequency (TF) – TF for a term “t” is defined as the count of a term “t” in a document “D” Inverse Document Frequency (IDF) – IDF for a term is defined as logarithm of ratio of total documents available in the corpus and number of documents containing the term T. TF . IDF F – TF IDF F formula gives the re elative impo ortance of a term in a corpus (liist of documen nts), given by the follo owing formula below. Following iis the code e using pyth hon’s scikit lea arn package e to convert a text into tf idf vectorss: ``` from skl learn.feature_extracti ion.text import TfidfV Vectorizer obj = Tf fidfVectorizer() corpus = ['This is sample doc cument.', 'another ran ndom docume ent.', 'third sample d docum ent text t'] X = obj.fit_transfo orm(corpus) ) print X >>> (0, 1) 0.345205016 0 865 (0, 4) ... 0.444514 4311537 (2, 1) 0.345205016 0 865 (2, 4) 0.444514311537 ``` The model creates a vocabulary dictionary and assigns an index to each word. Each row in the output contains a tuple (i,j) and a tf-idf value of word at index j in document i. B. Count / Density / Readability Features Count or Density based features can also be used in models and analysis. These features might seem trivial but shows a great impact in learning models. Some of the features are: Word Count, Sentence Count, Punctuation Counts and Industry specific word counts. Other types of measures include readability measures such as syllable counts, smog index and flesch reading ease. Refer to Textstat library to create such features. 3.4 Word Embedding (text vectors) Word embedding is the modern way of representing words as vectors. The aim of word embedding is to redefine the high dimensional word features into low dimensional feature vectors by preserving the contextual similarity in the corpus. They are widely used in deep learning models such as Convolutional Neural Networks and Recurrent Neural Networks. Word2Vec and GloVe are the two popular models to create word embedding of a text. These models takes a text corpus as input and produces the word vectors as output. Word2Vec model is composed of preprocessing module, a shallow neural network model called Continuous Bag of Words and another shallow neural network model called skipgram. These models are widely used for all other nlp problems. It first constructs a vocabulary from the training corpus and then learns word embedding representations. Following code using gensim package prepares the word embedding as the vectors. ``` from gensim.models import Word2Vec sentences = [['data', 'science'], ['vidhya', 'science', 'data', 'analytics'],['machin e', 'learning'], ['deep', 'learning']] # train the model on your corpus model = Word2Vec(sentences, min_count = 1) print model.similarity('data', 'science') >>> 0.11222489293 print model['learning'] >>> array([ 0.00459356 0.00303564 ‐0.00467622 0.00209638, ...]) ``` They can be used as feature vectors for ML model, used to measure text similarity using cosine similarity techniques, words clustering and text classification techniques. 4. Important tasks of NLP This sec ction talks ab bout differen nt use cases and probl ems in the ffield of natu ural languag ge processing. 4.1 Text Classification Text classification is one of th he classical problem o of NLP. Nottorious examples inclu ude – Email Spam S Identtification, to opic classification of news, se entiment cla assification and organiza ation of web pages by search s engin nes. Text clas ssification, in common words is defined as a technique to systema atically classsify a text obje ect (docume ent or sentence) in one of the fixed d category. It is really h helpful when the amount of data is too large, especially for organizzing, information filterin ng, and sto orage purposes s. A typica al natural la anguage cla assifier cons sists of two o parts: (a) Training (b b) Predictio on as shown in n image be elow. Firstly y the text in nput is proccesses and features a are created.. The machine e learning models m then n learn these features and is used d for prediccting againsst the new textt. Here is a code that uses naive bayes class sifier using ttext blob lib brary (built o on top of nltkk). ``` from tex xtblob.classifiers imp port NaiveBayesClassif fier as NBC C from tex xtblob import TextBlob b training_corpus = [ ('I am exhausted of this work.', 'Class_B'), ("I can't cooperate with this", 'Class_B'), ('He is my badest enemy!', 'Class_B'), ('My management is poor.', 'Class_B'), ('I love this burger.', 'Class_A'), ('This is an brilliant place!', 'Class_A'), ('I feel very good about these dates.', 'Class_A'), ('This is my best work.', 'Class_A'), ("What an awesome view", 'Class_A'), ('I do not like this dish', 'Class_B')] test_corpus = [ ("I am not feeling well today.", 'Class_B'), ("I feel brilliant!", 'Class_A'), ('Gary is a friend of mine.', 'Class_A'), ("I can't believe I'm doing this.", 'Class_B'), ('The date was good.', 'Class_A'), ('I do not enjoy my job', 'Class_B ')] model = NBC(training_corpus) print(model.classify("Their codes are amazing.")) >>> "Class_A" print(model.classify("I don't like their computer.")) >>> "Class_B" print(model.accuracy(test_corpus)) >>> 0.83 ``` Scikit.Learn also provides a pipeline framework for text classification: ``` from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics import classification_report from sklearn import svm # preparing data for SVM model (using the same training_corpus, test_corpus from naiv e bayes example) train_data = [] train_labels = [] for row in training_corpus: train_data.append(row[0]) train_labels.append(row[1]) test_data = [] test_labels = [] for row in test_corpus: test_data.append(row[0]) test_labels.append(row[1]) # Create feature vectors vectorizer = TfidfVectorizer(min_df=4, max_df=0.9) # Train the feature vectors train_vectors = vectorizer.fit_transform(train_data) # Apply model on test data test_vectors = vectorizer.transform(test_data) # Perform classification with SVM, kernel=linear model = svm.SVC(kernel='linear') model.fit(train_vectors, train_labels) prediction = model.predict(test_vectors) >>> ['Class_A' 'Class_A' 'Class_B' 'Class_B' 'Class_A' 'Class_A'] print (classification_report(test_labels, prediction)) ``` The text classification model are heavily dependent upon the quality and quantity of features, while applying any machine learning model it is always a good practice to include more and more training data. H ere are some tips that I wrote about improving the text classification accuracy in one of my previous article. 4.2 Text Matching / Similarity One of the important areas of NLP is the matching of text objects to find similarities. Important applications of text matching includes automatic spelling correction, data deduplication and genome analysis etc. A number of text matching techniques are available depending upon the requirement. This section describes the important techniques in detail. A. Levenshtein Distance – The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. Following is the implementation for efficient memory computations. ``` def levenshtein(s1,s2): if len(s1) > len(s2): s1,s2 = s2,s1 distances = range(len(s1) + 1) for index2,char2 in enumerate(s2): newDistances = [index2+1] for index1,char1 in enumerate(s1): if char1 == char2: newDistances.append(distances[index1]) else: newDistances.append(1 + min((distances[index1], distances[index1+1], newDistances[‐1]))) distances = newDistances return distances[‐1] print(levenshtein("analyze","analyse")) ``` B. Phonetic Matching – A Phonetic matching algorithm takes a keyword as input (person’s name, location name etc) and produces a character string that identifies a set of words that are (roughly) phonetically similar. It is very useful for searching large text corpuses, correcting spelling errors and matching relevant names. Soundex and Metaphone are two main phonetic algorithms used for this purpose. Python’s module Fuzzy is used to compute soundex strings for different words, for example – ``` import fuzzy soundex = fuzzy.Soundex(4) print soundex('ankit') >>> “A523” print soundex('aunkit') >>> “A523” ``` C. Flexible String Matching – A complete text matching system includes different algorithms pipelined together to compute variety of text variations. Regular expressions are really helpful for this purposes as well. Another common techniques include – exact string matching, lemmatized matching, and compact matching (takes care of spaces, punctuation’s, slangs etc). D. Cosine Similarity – W hen the text is represented as vector notation, a general cosine similarity can also be applied in order to measure vectorized similarity. Following code converts a text to vectors (using term frequency) and applies cosine similarity to provide closeness among two text. ``` import math from collections import Counter def get_cosine(vec1, vec2): common = set(vec1.keys()) & set(vec2.keys()) numerator = sum([vec1[x] * vec2[x] for x in common]) sum1 = sum([vec1[x]**2 for x in vec1.keys()]) sum2 = sum([vec2[x]**2 for x in vec2.keys()]) denominator = math.sqrt(sum1) * math.sqrt(sum2) if not denominator: return 0.0 else: return float(numerator) / denominator def text_to_vector(text): words = text.split() return Counter(words) text1 = 'This is an article on analytics vidhya' text2 = 'article on analytics vidhya is about natural language processing' vector1 = text_to_vector(text1) vector2 = text_to_vector(text2) cosine = get_cosine(vector1, vector2) >>> 0.62 ``` 4.3 Coreference Resolution Coreference Resolution is a process of finding relational links among the words (or phrases) within the sentences. Consider an example sentence: ” Donald went to John’s office to see the new table. He looked at it for an hour.“ Humans can quickly figure out that “he” denotes Donald (and not John), and that “it” denotes the table (and not John’s office). Coreference Resolution is the component of NLP that does this job automatically. It is used in document summarization, question answering, and information extraction. Stanford CoreNLP provides a python wrapper for commercial purposes. 4.4 Other NLP problems / tasks Text Summarization – Given a text article or paragraph, summarize it automatically to produce most important and relevant sentences in order. Machine Translation – Automatically translate text from one human language to another by taking care of grammar, semantics and information about the real world, etc. Natural Language Generation and Understanding – Convert information from computer databases or semantic intents into readable human language is called language generation. Converting chunks of text into more logical structures that are easier for computer programs to manipulate is called language understanding. Optical Character Recognition – Given an image representing printed text, determine the corresponding text. Document to Information – This involves parsing of textual data present in documents (websites, files, pdfs and images) to analyzable and clean format. 5. Important Libraries for NLP (python) Scikit-learn: Machine learning in Python Natural Language Toolkit (NLTK): The complete toolkit for all NLP techniques. Pattern – A web mining module for the with tools for NLP and machine learning. TextBlob – Easy to use nl p tools API, built on top of NLTK and Pattern. spaCy – Industrial strength N LP with Python and Cython. Gensim – Topic Modelling for Humans Stanford Core NLP – NLP services and packages by Stanford NLP Group.
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : Yes Author : Machine Learning Create Date : 2019:03:03 12:54:15+05:30 Modify Date : 2019:03:03 12:54:15+05:30 XMP Toolkit : Adobe XMP Core 4.2.1-c041 52.342996, 2008/05/07-20:48:00 Creator Tool : PScript5.dll Version 5.2.2 Producer : Acrobat Distiller 9.0.0 (Windows) Format : application/pdf Title : Microsoft Word - Ultimate Guide to Understand.docx Creator : Machine Learning Document ID : uuid:c0150dfe-c5be-45b6-aca4-b0c47298bf43 Instance ID : uuid:ae55b8bc-9aac-4442-9040-4ee6785529c3 Page Count : 29EXIF Metadata provided by EXIF.tools