camara94/python-text-mining

Repository files navigation

In order to be successful in this course, you will need to know how to program in Python. The expectation is that you have completed the first three courses in this Applied Data Science with Python series, specifically Course 1 on Introduction to Data Science in Python and Course 3 on Applied Machine Learning in Python, so that you are familiar with the numpy and pandas Python libraries for data manipulation, and scikit-learn toolkit for machine learning algorithms.

  • Sentences / input strings
  • Words or Tokens
  • Characters
  • Document, larger files

In this course in python we are tolking about all this concepts and their properties

Les liens utils:

  1. https://docs.python.org/3/library/re.html

  2. https://www.analyticsvidhya.com/blog/2014/11/text-data-cleaning-steps-python/

  3. https://ieva.rocks/2016/08/07/cleaning-text-for-nlp/

  4. https://chrisalbon.com/python/cleaning_text.html

In this module, we will tolk to Natural Language

  • Language used for everyday communication by humans
    • English
    • Chinese
    • spanish

compared to the artificial computer language

  • Any computation, manipulation of natural language
  • Natural language evolve
    • new words get added
    • old words lose popularity
    • language rules themselves may change.
  • Computing words, counting frequency of words
  • Finding sentence boundaries
  • Part of speech tagging
  • Parsing the sentence structure
  • Identifying semantic roles
  • Identifying entities in a sentences
  • Finding which pronoun refers to which entity
  • NLTK: Natural Language Toolkit
  • Open source library in Python
  • Has support for most NLP tasks
  • Also provides access to numerous text corpora
  • Importation
    import nltk

  • Let's get some text corpora
    nltk.download()

    from nltk.dowload()

    for more information see lab week2

  • Recall splitting a sentence into words / tokens
  • Recall high school grammar: nouns, verbs, adjectives,...
    image 2

image 3

image 4

image 5

image 6

image 8

image 10

image  11

image  12

image  14

image  15

image  16

image  19

image 20

image 22

image 23

image 24

image 25

image 26

image 27

image 29

image 30

image 32

image 33

image 34

image 35

image 37

image 38

image 39

image 40

image 41

image 42

image 44

image 45

image 46

image 47

image 48

image 49

image 50

image 51

image 52

image 53

image 54

image 55

image 57

image 58

image 60

image 59

image 61

image 62

image 63

image 65

image 68

image 69

image 70

image 71

image 66

image 72

image 75

image 67

image 76

image 77

image 78

image 79

image 80

image 81

image 82

image 83

image 86

image 88

image 89

image 90

image 92

image 93

image 94

image 95

image 96

image 97

image 98

image 99

image 100

About

In order to be successful in this course, you will need to know how to program in Python. The expectation is that you have completed the first three courses in this Applied Data Science with Python series, specifically Course 1 on Introduction to Data Science in Python and Course 3 on Applied Machine Learning in Python, so that you are familiar …

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published