Feature Engineering
Text Preprocessing and Feature Engineering for Textual Data
Natural Language Processing
Table of Contents
Aim
Text preprocessing level two operations - Feature engineering of textual data
- To implement label encoding and one hot encoding on textual data.
- To implement Bag of Words (BoW) feature engineering technique on textual data.
- To implement TF-IDF feature engineering technique.
- To analyze and comprehend the effect of various approaches to convert text into vectors.
Prerequisite
- Python
Outcome
After successful completion of this experiment, students will be able to understand concepts, significance, and limitations of various feature engineering methods applied on textual data.
Theory
Representation of a word in appropriate numerical form is called as feature engineering for text. There are various approaches to feature engineering like label encoding, one hot encoding, bag of words, TF-IDF, etc.
One Hot Encoding
Algorithm
- Create a vocabulary from the given corpus.
- Assign binary vector to each word in the vocabulary.
Example
After tokenization, punctuation removal, and stop word removal: [‘india’, ‘country’, ‘occupies’, ‘greater’, ‘part’, ‘south’, ‘asia’, ‘capital’, ’new’, ‘delhi’]
Vocabulary:
india | country | occupies | greater | part | south | asia | capital | new | delhi | |
---|---|---|---|---|---|---|---|---|---|---|
india | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
country | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
occupies | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
greater | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
part | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
south | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
asia | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
capital | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
new | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
delhi | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
Label Encoding
Algorithm
- Create a vocabulary from the given corpus.
- Assign a number to each word in the vocabulary.
Example
After tokenization, punctuation removal, and stop word removal: [‘india’, ‘country’, ‘occupies’, ‘greater’, ‘part’, ‘south’, ‘asia’, ‘capital’, ’new’, ‘delhi’]
After sorting and assigning an index: {‘asia’: 1, ‘capital’: 2, ‘country’: 3, ‘delhi’: 4, ‘greater’: 5, ‘india’: 6, ’new’: 7, ‘occupies’: 8, ‘part’: 9, ‘south’: 10}
Output after label encoding:
india | country | occupies | greater | part | south | asia | capital | new | delhi | |
---|---|---|---|---|---|---|---|---|---|---|
Sentence1 | 6 | 3 | 8 | 5 | 9 | 10 | 1 | 2 | 7 | 4 |
Bag of Words (BoW)
A bag-of-words model is a way of extracting features from text for use in modeling, such as with machine learning algorithms. It involves creating a vocabulary of known words and measuring the presence of these words in the document, disregarding word order.
TF-IDF (Term Frequency-Inverse Document Frequency)
TF-IDF is another feature engineering technique that weighs the importance of words in a document relative to the entire corpus.
This README provides a comprehensive overview of text preprocessing operations in NLP, detailing various feature engineering techniques without including any code snippets.
# import libraries used
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
import nltk
from nltk import sent_tokenize, word_tokenize, RegexpTokenizer, pos_tag
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer
import pandas as pd
from nltk.probability import FreqDist
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
Tasks
a. To implement label encoding and one hot encoding on textual data
Performing preprocessing operations such as tokenization, punctuation removal and stop word removal before operating on the data
sentence = "This is a demo text used for testing the various built in methods of nltk library"
# Case uniformity
sentence = sentence.lower()
# Tokenization and Stopword removal
stopword = stopwords.words('english')
word_tokens = nltk.word_tokenize(sentence)
removing_stopwords = [word for word in word_tokens if word not in stopword]
print(removing_stopwords)
['demo', 'text', 'used', 'testing', 'various', 'built', 'methods', 'nltk', 'library']
Label Encoding:
# Creating an initial dataframe
dog_types = ("affenpinscher",
"Afghan hound",
"Airedale terrier",
"Akita",
"Alaskan Malamute",
"American Staffordshire terrier",
"American water spaniel",
"Australian cattle dog",
"Australian shepherd",
"Australian terrier",
"basenji",
"basset hound",
"beagle",
"bearded collie",
"Bedlington terrier",
"Bernese mountain dog",
"bichon frise",
"black and tan coonhound",
"bloodhound",
"border collie",
"border terrier",
"borzoi",
"Boston terrier",
"bouvier des Flandres",
"boxer",
"briard",
"Brittany",
"Brussels griffon",
"bull terrier",
"bulldog",
"bullmastiff",
"cairn terrier",
"Canaan dog",
"Chesapeake Bay retriever",
"Chihuahua",
"Chinese crested",
"Chinese shar-pei",
"chow chow",
"Clumber spaniel",
"cocker spaniel",
"collie",
"curly-coated retriever",
"dachshund",
"Dalmatian",
"Doberman pinscher",
"English cocker spaniel",
"English setter",
"English springer spaniel",
"English toy spaniel",
"Eskimo dog",
"Finnish spitz",
"flat-coated retriever",
"fox terrier",
"foxhound",
"French bulldog",
"German shepherd",
"German shorthaired pointer",
"German wirehaired pointer",
"golden retriever",
"Gordon setter",
"Great Dane",
"greyhound",
"Irish setter",
"Irish water spaniel",
"Irish wolfhound",
"Jack Russell terrier",
"Japanese spaniel",
"keeshond",
"Kerry blue terrier",
"komondor",
"kuvasz",
"Labrador retriever",
"Lakeland terrier",
"Lhasa apso",
"Maltese",
"Manchester terrier",
"mastiff",
"Mexican hairless",
"Newfoundland",
"Norwegian elkhound",
"Norwich terrier",
"otterhound",
"papillon",
"Pekingese",
"pointer",
"Pomeranian",
"poodle",
"pug",
"puli",
"Rhodesian ridgeback",
"Rottweiler",
"Saint Bernard",
"saluki",
"Samoyed",
"schipperke",
"schnauzer",
"Scottish deerhound",
"Scottish terrier",
"Sealyham terrier",
"Shetland sheepdog",
"shih tzu",
"Siberian husky",
"silky terrier",
"Skye terrier",
"Staffordshire bull terrier",
"soft-coated wheaten terrier",
"Sussex spaniel",
"spitz",
"Tibetan terrier",
"vizsla",
"Weimaraner",
"Welsh terrier",
"West Highland white terrier",
"whippet",
"Yorkshire terrier")
dogs_df = pd.DataFrame(dog_types, columns = ['Dog_Types'])
# Creating instance of labelencoder
labelencoder = LabelEncoder()
dogs_df['Dog_Types_Categories'] = labelencoder.fit_transform(dogs_df['Dog_Types'])
dogs_df
Dog_Types | Dog_Types_Categories | |
---|---|---|
0 | affenpinscher | 68 |
1 | Afghan hound | 0 |
2 | Airedale terrier | 1 |
3 | Akita | 2 |
4 | Alaskan Malamute | 3 |
... | ... | ... |
110 | Weimaraner | 64 |
111 | Welsh terrier | 65 |
112 | West Highland white terrier | 66 |
113 | whippet | 114 |
114 | Yorkshire terrier | 67 |
115 rows × 2 columns
dogs_df['Dog_Types_Categories']
0 68
1 0
2 1
3 2
4 3
...
110 64
111 65
112 66
113 114
114 67
Name: Dog_Types_Categories, Length: 115, dtype: int32
One Hot Encoding:
Using sci-kit learn library approach:
# Creating instance of one-hot-encoder
enc = OneHotEncoder(handle_unknown='ignore') # ‘ignore’ : When an unknown category is encountered during transform, the resulting one-hot encoded columns for this feature will be all zeros. In the inverse transform, an unknown category will be denoted as None.
# passing Dog_Types_Categories column (label encoded values of bridge_types)
enc_df = pd.DataFrame(enc.fit_transform(dogs_df[['Dog_Types_Categories']]).toarray())
# Merge with main df bridge_df on key values
dogs_df = dogs_df.join(enc_df)
dogs_df
Dog_Types | Dog_Types_Categories | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | ... | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | affenpinscher | 68 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | Afghan hound | 0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | Airedale terrier | 1 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | Akita | 2 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | Alaskan Malamute | 3 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
110 | Weimaraner | 64 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
111 | Welsh terrier | 65 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
112 | West Highland white terrier | 66 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
113 | whippet | 114 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
114 | Yorkshire terrier | 67 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
115 rows × 117 columns
Get Dummies method:
# Creating an initial dataframe
dog_types = ("affenpinscher",
"Afghan hound",
"Airedale terrier",
"Akita",
"Alaskan Malamute",
"American Staffordshire terrier",
"American water spaniel",
"Australian cattle dog",
"Australian shepherd",
"Australian terrier",
"basenji",
"basset hound",
"beagle",
"bearded collie",
"Bedlington terrier",
"Bernese mountain dog",
"bichon frise",
"black and tan coonhound",
"bloodhound",
"border collie",
"border terrier",
"borzoi",
"Boston terrier",
"bouvier des Flandres",
"boxer",
"briard",
"Brittany",
"Brussels griffon",
"bull terrier",
"bulldog",
"bullmastiff",
"cairn terrier",
"Canaan dog",
"Chesapeake Bay retriever",
"Chihuahua",
"Chinese crested",
"Chinese shar-pei",
"chow chow",
"Clumber spaniel",
"cocker spaniel",
"collie",
"curly-coated retriever",
"dachshund",
"Dalmatian",
"Doberman pinscher",
"English cocker spaniel",
"English setter",
"English springer spaniel",
"English toy spaniel",
"Eskimo dog",
"Finnish spitz",
"flat-coated retriever",
"fox terrier",
"foxhound",
"French bulldog",
"German shepherd",
"German shorthaired pointer",
"German wirehaired pointer",
"golden retriever",
"Gordon setter",
"Great Dane",
"greyhound",
"Irish setter",
"Irish water spaniel",
"Irish wolfhound",
"Jack Russell terrier",
"Japanese spaniel",
"keeshond",
"Kerry blue terrier",
"komondor",
"kuvasz",
"Labrador retriever",
"Lakeland terrier",
"Lhasa apso",
"Maltese",
"Manchester terrier",
"mastiff",
"Mexican hairless",
"Newfoundland",
"Norwegian elkhound",
"Norwich terrier",
"otterhound",
"papillon",
"Pekingese",
"pointer",
"Pomeranian",
"poodle",
"pug",
"puli",
"Rhodesian ridgeback",
"Rottweiler",
"Saint Bernard",
"saluki",
"Samoyed",
"schipperke",
"schnauzer",
"Scottish deerhound",
"Scottish terrier",
"Sealyham terrier",
"Shetland sheepdog",
"shih tzu",
"Siberian husky",
"silky terrier",
"Skye terrier",
"Staffordshire bull terrier",
"soft-coated wheaten terrier",
"Sussex spaniel",
"spitz",
"Tibetan terrier",
"vizsla",
"Weimaraner",
"Welsh terrier",
"West Highland white terrier",
"whippet",
"Yorkshire terrier")
dogs_df = pd.DataFrame(dog_types, columns = ['Dog_Types'])
dum_df = pd.get_dummies(dogs_df, columns=["Dog_Types"], prefix=["Type_is"] )
# Merge with main df bridge_df on key values
dogs_df = dogs_df.join(dum_df)
dogs_df
Dog_Types | Type_is_Afghan hound | Type_is_Airedale terrier | Type_is_Akita | Type_is_Alaskan Malamute | Type_is_American Staffordshire terrier | Type_is_American water spaniel | Type_is_Australian cattle dog | Type_is_Australian shepherd | Type_is_Australian terrier | ... | Type_is_puli | Type_is_saluki | Type_is_schipperke | Type_is_schnauzer | Type_is_shih tzu | Type_is_silky terrier | Type_is_soft-coated wheaten terrier | Type_is_spitz | Type_is_vizsla | Type_is_whippet | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | affenpinscher | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
1 | Afghan hound | True | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
2 | Airedale terrier | False | True | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
3 | Akita | False | False | True | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
4 | Alaskan Malamute | False | False | False | True | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
110 | Weimaraner | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
111 | Welsh terrier | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
112 | West Highland white terrier | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
113 | whippet | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | True |
114 | Yorkshire terrier | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
115 rows × 116 columns
b. To implement Bag of Words (BoW) feature engineering technique on textual data
Using user defined function after preprocessing:
doc1 = 'Game of Thrones is an amazing tv series!'
doc2 = 'Game of Thrones is the best tv series!'
doc3 = 'Game of Thrones is so great'
l_doc1 = re.sub(r"[^a-zA-Z0-9]", " ", doc1.lower()).split()
l_doc2 = re.sub(r"[^a-zA-Z0-9]", " ", doc2.lower()).split()
l_doc3 = re.sub(r"[^a-zA-Z0-9]", " ", doc3.lower()).split()
wordset12 = np.union1d(l_doc1,l_doc2)
wordset = np.union1d(wordset12,l_doc3)
print(wordset)
['amazing' 'an' 'best' 'game' 'great' 'is' 'of' 'series' 'so' 'the'
'thrones' 'tv']
def calculateBOW(wordset,l_doc):
tf_diz = dict.fromkeys(wordset,0)
for word in l_doc:
tf_diz[word]=l_doc.count(word)
return tf_diz
bow1 = calculateBOW(wordset,l_doc1)
bow2 = calculateBOW(wordset,l_doc2)
bow3 = calculateBOW(wordset,l_doc3)
df_bow = pd.DataFrame([bow1,bow2,bow3])
df_bow.head()
amazing | an | best | game | great | is | of | series | so | the | thrones | tv | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 |
1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 |
2 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 |
Using sci-kit learn library:
doc1 = 'Game of Thrones is an amazing tv series!'
doc2 = 'Game of Thrones is the best tv series!'
doc3 = 'Game of Thrones is so great'
CountVec = CountVectorizer(ngram_range=(1,1), # to use bigrams ngram_range=(2,2)
stop_words='english')
# Transform
Count_data = CountVec.fit_transform([doc1, doc2, doc3])
# Initializing the dataframe
cv_dataframe = pd.DataFrame(Count_data.toarray(), columns=CountVec.get_feature_names_out())
cv_dataframe.head()
amazing | best | game | great | series | thrones | tv | |
---|---|---|---|---|---|---|---|
0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 |
1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 |
2 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |
c. To implement TF-IDF feature engineering technique
documents = ["Inflation has increased unemployment",
"The company has increased its sales",
"Fear increased his pulse"]
# Preprocessing
def return_preprocessed_document(document):
document = document.lower()
# Word Tokenization
words = word_tokenize(document)
# Stop Words removal
words = [word for word in words if word not in stopwords.words("english")]
# Forming the complete sentence using String join
document = " ".join(words)
return document
documents = [return_preprocessed_document(document) for document in documents]
documents
['inflation increased unemployment',
'company increased sales',
'fear increased pulse']
# Creation of a TF-IDF model using Tfidf vectorizer function.
vectorizer = TfidfVectorizer()
tfidf_model = vectorizer.fit_transform(documents)
print(tfidf_model)
(0, 6) 0.652490884512534
(0, 2) 0.3853716274664007
(0, 3) 0.652490884512534
(1, 5) 0.652490884512534
(1, 0) 0.652490884512534
(1, 2) 0.3853716274664007
(2, 4) 0.652490884512534
(2, 1) 0.652490884512534
(2, 2) 0.3853716274664007
pd.DataFrame(tfidf_model.toarray(), columns = vectorizer.get_feature_names_out())
company | fear | increased | inflation | pulse | sales | unemployment | |
---|---|---|---|---|---|---|---|
0 | 0.000000 | 0.000000 | 0.385372 | 0.652491 | 0.000000 | 0.000000 | 0.652491 |
1 | 0.652491 | 0.000000 | 0.385372 | 0.000000 | 0.000000 | 0.652491 | 0.000000 |
2 | 0.000000 | 0.652491 | 0.385372 | 0.000000 | 0.652491 | 0.000000 | 0.000000 |
From the above output, we can infer the following:
We created our TF-IDF model where the sample sentences are converted into matrix format with higher weights assigned to semantically important words in a document such as inflation and unemployment in:
sentence 1, company and sales in sentence 2, and fear and pulse in sentence. 3. While frequent word across all documents, increased, assigned with lower weights, i.e., 0.385372
Additional task of classifying documents:
file = open("textfile.txt", 'r', encoding="mbcs")
d = {}
for i in file.read().split():
print(i)
file.close()
Medical:
Hospital
Emergency
Room
(ER)
Intensive
Care
Unit
(ICU)
Operating
Room
(OR)
Exam
Diagnosis
Prescription
Urine
sample
Blood
sample
Hypertension
Cast
Vein
Syringe
Painkiller/pain
reliever
Numb
Dosage
Biopsy
(of
abnormal
cells)
Finanace:
1.
Amortization:
Amortization
is
a
method
of
spreading
an
intangible
asset's
cost
over
the
course
of
its
useful
life.
Intangible
assets
are
non-physical
assets
that
are
essential
to
a
company,
such
as
a
trademark,
patent,
copyright,
or
franchise
agreement.
2.
Assets:
Assets
are
items
you
own
that
can
provide
future
benefit
to
your
business,
such
as
cash,
inventory,
real
estate,
office
equipment,
or
accounts
receivable,
which
are
payments
due
to
a
company
by
its
customers.
There
are
different
types
of
assets,
including:
Current
Assets:
Which
can
be
converted
to
cash
within
a
year
Fixed
Assets:
Which
can’t
immediately
be
turned
into
cash,
but
are
tangible
items
that
a
company
owns
and
uses
to
generate
long-term
income
3.
Asset
Allocation:
Asset
allocation
refers
to
how
you
choose
to
spread
your
money
across
different
investment
types,
also
known
as
asset
classes.
These
include:
Bonds:
Bonds
represent
a
form
of
borrowing.
When
you
buy
a
bond,
typically
from
the
government
or
a
corporation,
you’re
essentially
lending
them
money.
You
receive
periodic
interest
payments
and
get
back
the
loaned
amount
at
the
time
of
the
bond’s
maturity—or
the
defined
term
at
which
the
bond
can
be
redeemed.
Stocks:
A
stock
is
a
share
of
ownership
in
a
public
or
private
company.
When
you
buy
stock
in
a
company,
you
become
a
shareholder
and
can
receive
dividends—the
company’s
profits—if
and
when
they
are
distributed.
Cash
and
Cash
Equivalents:
This
refers
to
any
asset
in
the
form
of
cash,
or
which
can
be
converted
to
cash
easily
in
the
event
it's
necessary.
4.
Balance
Sheet:
A
balance
sheet
is
an
important
financial
statement
that
communicates
an
organization’s
worth,
or
“book
value.â€
The
balance
sheet
includes
a
tally
of
the
organization’s
assets,
liabilities,
and
shareholders’
equity
for
a
given
reporting
period.
The
Balance
Sheet
Equation:
Balance
sheets
are
arranged
according
to
the
following
equation:
Assets
=
Liabilities
+
Owners’
Equity
5.
Capital
Gain:
A
capital
gain
is
an
increase
in
the
value
of
an
asset
or
investment
above
the
price
you
initially
paid
for
it.
If
you
sell
the
asset
for
less
than
the
original
purchase
price,
that
would
be
considered
a
capital
loss.
Related:
6
Ways
Understanding
Finance
Can
Help
You
Excel
Professionally
6.
Capital
Market:
This
is
a
market
where
buyers
and
sellers
engage
in
the
trade
of
financial
assets,
including
stocks
and
bonds.
Capital
markets
feature
several
participants,
including:
Companies:
Firms
that
sell
stocks
and
bonds
to
investors
Institutional
investors:
Investors
who
purchase
stocks
and
bonds
on
behalf
of
a
large
capital
base
Mutual
funds:
A
mutual
fund
is
an
institutional
investor
that
manages
the
investments
of
thousands
of
individuals
Hedge
funds:
A
hedge
fund
is
another
type
of
institutional
investor,
which
controls
risk
through
hedging—a
process
of
buying
one
stock
and
then
shorting
a
similar
stock
to
make
money
from
the
difference
in
their
relative
performance
7.
Cash
Flow:
Cash
flow
refers
to
the
net
balance
of
cash
moving
in
and
out
of
a
business
at
a
specific
point
in
time.
Cash
flow
is
commonly
broken
into
three
categories,
including:
Operating
Cash
Flow:
The
net
cash
generated
from
normal
business
operations
Investing
Cash
Flow:
The
net
cash
generated
from
investing
activities,
such
as
securities
investments
and
the
purchase
or
sale
of
assets
Financing
Cash
Flow:
The
net
cash
generated
financing
a
business,
including
debt
payments,
shareholders’
equity,
and
dividend
payments
8.
Cash
Flow
Statement:
A
cash
flow
statement
is
a
financial
statement
prepared
to
provide
a
detailed
analysis
of
what
happened
to
a
company’s
cash
during
a
given
period
of
time.
This
document
shows
how
the
business
generated
and
spent
its
cash
by
including
an
overview
of
cash
flows
from
operating,
investing,
and
financing
activities
during
the
reporting
period.
9.
Compound
Interest:
This
refers
to
“interest
on
interest.â€
Rather,
when
you’re
investing
or
saving,
compound
interest
is
earned
on
the
amount
you
deposited,
plus
any
interest
you’ve
accumulated
over
time.
While
it
can
grow
your
savings,
it
can
also
increase
your
debt;
compound
interest
is
charged
on
the
initial
amount
you
were
loaned,
as
well
as
the
expenses
added
to
your
outstanding
balance
over
time.
10.
Depreciation:
Depreciation
represents
the
decrease
in
an
asset’s
value.
It’s
a
term
commonly
used
in
accounting
and
shows
how
much
of
an
asset’s
value
a
business
has
used
over
a
period
of
time.
Equity
# We take two documents containing text about medical and finance related words.
# We need to read the text file and classify which field the document belongs to.
# This can be done by counting the words relating to finance and medical and seeing which count is larger (simplest method).
file = open("textfile.txt", 'r', encoding="mbcs")
d = {}
for i in file.read().split():
if i in d:
d[i] += 1
else:
d[i] = 1
print(d)
file.close()
{'Medical:': 1, 'Hospital': 1, 'Emergency': 1, 'Room': 2, '(ER)': 1, 'Intensive': 1, 'Care': 1, 'Unit': 1, '(ICU)': 1, 'Operating': 2, '(OR)': 1, 'Exam': 1, 'Diagnosis': 1, 'Prescription': 1, 'Urine': 1, 'sample': 2, 'Blood': 1, 'Hypertension': 1, 'Cast': 1, 'Vein': 1, 'Syringe': 1, 'Painkiller/pain': 1, 'reliever': 1, 'Numb': 1, 'Dosage': 1, 'Biopsy': 1, '(of': 1, 'abnormal': 1, 'cells)': 1, 'Finanace:': 1, '1.': 1, 'Amortization:': 1, 'Amortization': 1, 'is': 11, 'a': 29, 'method': 1, 'of': 23, 'spreading': 1, 'an': 9, 'intangible': 1, "asset's": 1, 'cost': 1, 'over': 4, 'the': 26, 'course': 1, 'its': 3, 'useful': 1, 'life.': 1, 'Intangible': 1, 'assets': 3, 'are': 8, 'non-physical': 1, 'that': 7, 'essential': 1, 'to': 17, 'company,': 2, 'such': 3, 'as': 6, 'trademark,': 1, 'patent,': 1, 'copyright,': 1, 'or': 9, 'franchise': 1, 'agreement.': 1, '2.': 1, 'Assets:': 3, 'Assets': 2, 'items': 2, 'you': 9, 'own': 1, 'can': 7, 'provide': 2, 'future': 1, 'benefit': 1, 'your': 5, 'business,': 2, 'cash,': 3, 'inventory,': 1, 'real': 1, 'estate,': 1, 'office': 1, 'equipment,': 1, 'accounts': 1, 'receivable,': 1, 'which': 4, 'payments': 3, 'due': 1, 'company': 2, 'by': 2, 'customers.': 1, 'There': 1, 'different': 2, 'types': 1, 'assets,': 3, 'including:': 3, 'Current': 1, 'Which': 2, 'be': 5, 'converted': 2, 'cash': 10, 'within': 1, 'year': 1, 'Fixed': 1, 'can’t': 1, 'immediately': 1, 'turned': 1, 'into': 2, 'but': 1, 'tangible': 1, 'owns': 1, 'and': 17, 'uses': 1, 'generate': 1, 'long-term': 1, 'income': 1, '3.': 1, 'Asset': 2, 'Allocation:': 1, 'allocation': 1, 'refers': 4, 'how': 3, 'choose': 1, 'spread': 1, 'money': 2, 'across': 1, 'investment': 2, 'types,': 1, 'also': 2, 'known': 1, 'asset': 4, 'classes.': 1, 'These': 1, 'include:': 1, 'Bonds:': 1, 'Bonds': 1, 'represent': 1, 'form': 2, 'borrowing.': 1, 'When': 2, 'buy': 2, 'bond,': 1, 'typically': 1, 'from': 5, 'government': 1, 'corporation,': 1, 'you’re': 2, 'essentially': 1, 'lending': 1, 'them': 1, 'money.': 1, 'You': 2, 'receive': 2, 'periodic': 1, 'interest': 4, 'get': 1, 'back': 1, 'loaned': 1, 'amount': 3, 'at': 3, 'time': 1, 'bond’s': 1, 'maturity—or': 1, 'defined': 1, 'term': 2, 'bond': 1, 'redeemed.': 1, 'Stocks:': 1, 'A': 6, 'stock': 4, 'share': 1, 'ownership': 1, 'in': 11, 'public': 1, 'private': 1, 'company.': 1, 'become': 1, 'shareholder': 1, 'dividends—the': 1, 'company’s': 2, 'profits—if': 1, 'when': 2, 'they': 1, 'distributed.': 1, 'Cash': 9, 'Equivalents:': 1, 'This': 4, 'any': 2, 'easily': 1, 'event': 1, "it's": 1, 'necessary.': 1, '4.': 1, 'Balance': 3, 'Sheet:': 1, 'balance': 4, 'sheet': 2, 'important': 1, 'financial': 3, 'statement': 3, 'communicates': 1, 'organization’s': 2, 'worth,': 1, '“book': 1, 'value.â€\x9d': 1, 'The': 5, 'includes': 1, 'tally': 1, 'liabilities,': 1, 'shareholders’': 2, 'equity': 1, 'for': 3, 'given': 2, 'reporting': 2, 'period.': 2, 'Sheet': 1, 'Equation:': 1, 'sheets': 1, 'arranged': 1, 'according': 1, 'following': 1, 'equation:': 1, '=': 1, 'Liabilities': 1, '+': 1, 'Owners’': 1, 'Equity': 2, '5.': 1, 'Capital': 3, 'Gain:': 1, 'capital': 3, 'gain': 1, 'increase': 2, 'value': 2, 'above': 1, 'price': 1, 'initially': 1, 'paid': 1, 'it.': 1, 'If': 1, 'sell': 2, 'less': 1, 'than': 1, 'original': 1, 'purchase': 3, 'price,': 1, 'would': 1, 'considered': 1, 'loss.': 1, 'Related:': 1, '6': 1, 'Ways': 1, 'Understanding': 1, 'Finance': 1, 'Can': 1, 'Help': 1, 'Excel': 1, 'Professionally': 1, '6.': 1, 'Market:': 1, 'market': 1, 'where': 1, 'buyers': 1, 'sellers': 1, 'engage': 1, 'trade': 1, 'including': 3, 'stocks': 3, 'bonds.': 1, 'markets': 1, 'feature': 1, 'several': 1, 'participants,': 1, 'Companies:': 1, 'Firms': 1, 'bonds': 2, 'investors': 1, 'Institutional': 1, 'investors:': 1, 'Investors': 1, 'who': 1, 'on': 4, 'behalf': 1, 'large': 1, 'base': 1, 'Mutual': 1, 'funds:': 2, 'mutual': 1, 'fund': 2, 'institutional': 2, 'investor': 1, 'manages': 1, 'investments': 2, 'thousands': 1, 'individuals': 1, 'Hedge': 1, 'hedge': 1, 'another': 1, 'type': 1, 'investor,': 1, 'controls': 1, 'risk': 1, 'through': 1, 'hedging—a': 1, 'process': 1, 'buying': 1, 'one': 1, 'then': 1, 'shorting': 1, 'similar': 1, 'make': 1, 'difference': 1, 'their': 1, 'relative': 1, 'performance': 1, '7.': 1, 'Flow:': 4, 'flow': 3, 'net': 4, 'moving': 1, 'out': 1, 'business': 4, 'specific': 1, 'point': 1, 'time.': 5, 'commonly': 2, 'broken': 1, 'three': 1, 'categories,': 1, 'generated': 4, 'normal': 1, 'operations': 1, 'Investing': 1, 'investing': 2, 'activities,': 1, 'securities': 1, 'sale': 1, 'Financing': 1, 'financing': 2, 'debt': 1, 'payments,': 1, 'equity,': 1, 'dividend': 1, '8.': 1, 'Flow': 1, 'Statement:': 1, 'prepared': 1, 'detailed': 1, 'analysis': 1, 'what': 1, 'happened': 1, 'during': 2, 'period': 2, 'document': 1, 'shows': 2, 'spent': 1, 'overview': 1, 'flows': 1, 'operating,': 1, 'investing,': 1, 'activities': 1, '9.': 1, 'Compound': 1, 'Interest:': 1, '“interest': 1, 'interest.â€\x9d': 1, 'Rather,': 1, 'saving,': 1, 'compound': 2, 'earned': 1, 'deposited,': 1, 'plus': 1, 'you’ve': 1, 'accumulated': 1, 'While': 1, 'it': 2, 'grow': 1, 'savings,': 1, 'debt;': 1, 'charged': 1, 'initial': 1, 'were': 1, 'loaned,': 1, 'well': 1, 'expenses': 1, 'added': 1, 'outstanding': 1, '10.': 1, 'Depreciation:': 1, 'Depreciation': 1, 'represents': 1, 'decrease': 1, 'asset’s': 2, 'value.': 1, 'It’s': 1, 'used': 2, 'accounting': 1, 'much': 1, 'has': 1}
medical_words = ["Medical", "Prescription", "hospital", "health", "exam", "Blood"]
finance_words = ["Invest", "market", "payment", "Withdraw", "Cash", "Depriciation", "Equity"]
med_d = {}
finan_d = {}
file = open("textfile.txt", 'r', encoding="mbcs")
for i in file.read().split():
if i.lower() in med_d and i in medical_words:
med_d[i] += 1
else:
med_d[i] = 1
print(med_d)
file.close()
print()
print()
file = open("textfile.txt", 'r', encoding="mbcs")
for i in file.read().split():
if i in finan_d and i in finance_words:
finan_d[i] += 1
else:
finan_d[i] = 1
print(finan_d)
{'Medical:': 1, 'Hospital': 1, 'Emergency': 1, 'Room': 1, '(ER)': 1, 'Intensive': 1, 'Care': 1, 'Unit': 1, '(ICU)': 1, 'Operating': 1, '(OR)': 1, 'Exam': 1, 'Diagnosis': 1, 'Prescription': 1, 'Urine': 1, 'sample': 1, 'Blood': 1, 'Hypertension': 1, 'Cast': 1, 'Vein': 1, 'Syringe': 1, 'Painkiller/pain': 1, 'reliever': 1, 'Numb': 1, 'Dosage': 1, 'Biopsy': 1, '(of': 1, 'abnormal': 1, 'cells)': 1, 'Finanace:': 1, '1.': 1, 'Amortization:': 1, 'Amortization': 1, 'is': 1, 'a': 1, 'method': 1, 'of': 1, 'spreading': 1, 'an': 1, 'intangible': 1, "asset's": 1, 'cost': 1, 'over': 1, 'the': 1, 'course': 1, 'its': 1, 'useful': 1, 'life.': 1, 'Intangible': 1, 'assets': 1, 'are': 1, 'non-physical': 1, 'that': 1, 'essential': 1, 'to': 1, 'company,': 1, 'such': 1, 'as': 1, 'trademark,': 1, 'patent,': 1, 'copyright,': 1, 'or': 1, 'franchise': 1, 'agreement.': 1, '2.': 1, 'Assets:': 1, 'Assets': 1, 'items': 1, 'you': 1, 'own': 1, 'can': 1, 'provide': 1, 'future': 1, 'benefit': 1, 'your': 1, 'business,': 1, 'cash,': 1, 'inventory,': 1, 'real': 1, 'estate,': 1, 'office': 1, 'equipment,': 1, 'accounts': 1, 'receivable,': 1, 'which': 1, 'payments': 1, 'due': 1, 'company': 1, 'by': 1, 'customers.': 1, 'There': 1, 'different': 1, 'types': 1, 'assets,': 1, 'including:': 1, 'Current': 1, 'Which': 1, 'be': 1, 'converted': 1, 'cash': 1, 'within': 1, 'year': 1, 'Fixed': 1, 'can’t': 1, 'immediately': 1, 'turned': 1, 'into': 1, 'but': 1, 'tangible': 1, 'owns': 1, 'and': 1, 'uses': 1, 'generate': 1, 'long-term': 1, 'income': 1, '3.': 1, 'Asset': 1, 'Allocation:': 1, 'allocation': 1, 'refers': 1, 'how': 1, 'choose': 1, 'spread': 1, 'money': 1, 'across': 1, 'investment': 1, 'types,': 1, 'also': 1, 'known': 1, 'asset': 1, 'classes.': 1, 'These': 1, 'include:': 1, 'Bonds:': 1, 'Bonds': 1, 'represent': 1, 'form': 1, 'borrowing.': 1, 'When': 1, 'buy': 1, 'bond,': 1, 'typically': 1, 'from': 1, 'government': 1, 'corporation,': 1, 'you’re': 1, 'essentially': 1, 'lending': 1, 'them': 1, 'money.': 1, 'You': 1, 'receive': 1, 'periodic': 1, 'interest': 1, 'get': 1, 'back': 1, 'loaned': 1, 'amount': 1, 'at': 1, 'time': 1, 'bond’s': 1, 'maturity—or': 1, 'defined': 1, 'term': 1, 'bond': 1, 'redeemed.': 1, 'Stocks:': 1, 'A': 1, 'stock': 1, 'share': 1, 'ownership': 1, 'in': 1, 'public': 1, 'private': 1, 'company.': 1, 'become': 1, 'shareholder': 1, 'dividends—the': 1, 'company’s': 1, 'profits—if': 1, 'when': 1, 'they': 1, 'distributed.': 1, 'Cash': 1, 'Equivalents:': 1, 'This': 1, 'any': 1, 'easily': 1, 'event': 1, "it's": 1, 'necessary.': 1, '4.': 1, 'Balance': 1, 'Sheet:': 1, 'balance': 1, 'sheet': 1, 'important': 1, 'financial': 1, 'statement': 1, 'communicates': 1, 'organization’s': 1, 'worth,': 1, '“book': 1, 'value.â€\x9d': 1, 'The': 1, 'includes': 1, 'tally': 1, 'liabilities,': 1, 'shareholders’': 1, 'equity': 1, 'for': 1, 'given': 1, 'reporting': 1, 'period.': 1, 'Sheet': 1, 'Equation:': 1, 'sheets': 1, 'arranged': 1, 'according': 1, 'following': 1, 'equation:': 1, '=': 1, 'Liabilities': 1, '+': 1, 'Owners’': 1, 'Equity': 1, '5.': 1, 'Capital': 1, 'Gain:': 1, 'capital': 1, 'gain': 1, 'increase': 1, 'value': 1, 'above': 1, 'price': 1, 'initially': 1, 'paid': 1, 'it.': 1, 'If': 1, 'sell': 1, 'less': 1, 'than': 1, 'original': 1, 'purchase': 1, 'price,': 1, 'would': 1, 'considered': 1, 'loss.': 1, 'Related:': 1, '6': 1, 'Ways': 1, 'Understanding': 1, 'Finance': 1, 'Can': 1, 'Help': 1, 'Excel': 1, 'Professionally': 1, '6.': 1, 'Market:': 1, 'market': 1, 'where': 1, 'buyers': 1, 'sellers': 1, 'engage': 1, 'trade': 1, 'including': 1, 'stocks': 1, 'bonds.': 1, 'markets': 1, 'feature': 1, 'several': 1, 'participants,': 1, 'Companies:': 1, 'Firms': 1, 'bonds': 1, 'investors': 1, 'Institutional': 1, 'investors:': 1, 'Investors': 1, 'who': 1, 'on': 1, 'behalf': 1, 'large': 1, 'base': 1, 'Mutual': 1, 'funds:': 1, 'mutual': 1, 'fund': 1, 'institutional': 1, 'investor': 1, 'manages': 1, 'investments': 1, 'thousands': 1, 'individuals': 1, 'Hedge': 1, 'hedge': 1, 'another': 1, 'type': 1, 'investor,': 1, 'controls': 1, 'risk': 1, 'through': 1, 'hedging—a': 1, 'process': 1, 'buying': 1, 'one': 1, 'then': 1, 'shorting': 1, 'similar': 1, 'make': 1, 'difference': 1, 'their': 1, 'relative': 1, 'performance': 1, '7.': 1, 'Flow:': 1, 'flow': 1, 'net': 1, 'moving': 1, 'out': 1, 'business': 1, 'specific': 1, 'point': 1, 'time.': 1, 'commonly': 1, 'broken': 1, 'three': 1, 'categories,': 1, 'generated': 1, 'normal': 1, 'operations': 1, 'Investing': 1, 'investing': 1, 'activities,': 1, 'securities': 1, 'sale': 1, 'Financing': 1, 'financing': 1, 'debt': 1, 'payments,': 1, 'equity,': 1, 'dividend': 1, '8.': 1, 'Flow': 1, 'Statement:': 1, 'prepared': 1, 'detailed': 1, 'analysis': 1, 'what': 1, 'happened': 1, 'during': 1, 'period': 1, 'document': 1, 'shows': 1, 'spent': 1, 'overview': 1, 'flows': 1, 'operating,': 1, 'investing,': 1, 'activities': 1, '9.': 1, 'Compound': 1, 'Interest:': 1, '“interest': 1, 'interest.â€\x9d': 1, 'Rather,': 1, 'saving,': 1, 'compound': 1, 'earned': 1, 'deposited,': 1, 'plus': 1, 'you’ve': 1, 'accumulated': 1, 'While': 1, 'it': 1, 'grow': 1, 'savings,': 1, 'debt;': 1, 'charged': 1, 'initial': 1, 'were': 1, 'loaned,': 1, 'well': 1, 'expenses': 1, 'added': 1, 'outstanding': 1, '10.': 1, 'Depreciation:': 1, 'Depreciation': 1, 'represents': 1, 'decrease': 1, 'asset’s': 1, 'value.': 1, 'It’s': 1, 'used': 1, 'accounting': 1, 'much': 1, 'has': 1}
{'Medical:': 1, 'Hospital': 1, 'Emergency': 1, 'Room': 1, '(ER)': 1, 'Intensive': 1, 'Care': 1, 'Unit': 1, '(ICU)': 1, 'Operating': 1, '(OR)': 1, 'Exam': 1, 'Diagnosis': 1, 'Prescription': 1, 'Urine': 1, 'sample': 1, 'Blood': 1, 'Hypertension': 1, 'Cast': 1, 'Vein': 1, 'Syringe': 1, 'Painkiller/pain': 1, 'reliever': 1, 'Numb': 1, 'Dosage': 1, 'Biopsy': 1, '(of': 1, 'abnormal': 1, 'cells)': 1, 'Finanace:': 1, '1.': 1, 'Amortization:': 1, 'Amortization': 1, 'is': 1, 'a': 1, 'method': 1, 'of': 1, 'spreading': 1, 'an': 1, 'intangible': 1, "asset's": 1, 'cost': 1, 'over': 1, 'the': 1, 'course': 1, 'its': 1, 'useful': 1, 'life.': 1, 'Intangible': 1, 'assets': 1, 'are': 1, 'non-physical': 1, 'that': 1, 'essential': 1, 'to': 1, 'company,': 1, 'such': 1, 'as': 1, 'trademark,': 1, 'patent,': 1, 'copyright,': 1, 'or': 1, 'franchise': 1, 'agreement.': 1, '2.': 1, 'Assets:': 1, 'Assets': 1, 'items': 1, 'you': 1, 'own': 1, 'can': 1, 'provide': 1, 'future': 1, 'benefit': 1, 'your': 1, 'business,': 1, 'cash,': 1, 'inventory,': 1, 'real': 1, 'estate,': 1, 'office': 1, 'equipment,': 1, 'accounts': 1, 'receivable,': 1, 'which': 1, 'payments': 1, 'due': 1, 'company': 1, 'by': 1, 'customers.': 1, 'There': 1, 'different': 1, 'types': 1, 'assets,': 1, 'including:': 1, 'Current': 1, 'Which': 1, 'be': 1, 'converted': 1, 'cash': 1, 'within': 1, 'year': 1, 'Fixed': 1, 'can’t': 1, 'immediately': 1, 'turned': 1, 'into': 1, 'but': 1, 'tangible': 1, 'owns': 1, 'and': 1, 'uses': 1, 'generate': 1, 'long-term': 1, 'income': 1, '3.': 1, 'Asset': 1, 'Allocation:': 1, 'allocation': 1, 'refers': 1, 'how': 1, 'choose': 1, 'spread': 1, 'money': 1, 'across': 1, 'investment': 1, 'types,': 1, 'also': 1, 'known': 1, 'asset': 1, 'classes.': 1, 'These': 1, 'include:': 1, 'Bonds:': 1, 'Bonds': 1, 'represent': 1, 'form': 1, 'borrowing.': 1, 'When': 1, 'buy': 1, 'bond,': 1, 'typically': 1, 'from': 1, 'government': 1, 'corporation,': 1, 'you’re': 1, 'essentially': 1, 'lending': 1, 'them': 1, 'money.': 1, 'You': 1, 'receive': 1, 'periodic': 1, 'interest': 1, 'get': 1, 'back': 1, 'loaned': 1, 'amount': 1, 'at': 1, 'time': 1, 'bond’s': 1, 'maturity—or': 1, 'defined': 1, 'term': 1, 'bond': 1, 'redeemed.': 1, 'Stocks:': 1, 'A': 1, 'stock': 1, 'share': 1, 'ownership': 1, 'in': 1, 'public': 1, 'private': 1, 'company.': 1, 'become': 1, 'shareholder': 1, 'dividends—the': 1, 'company’s': 1, 'profits—if': 1, 'when': 1, 'they': 1, 'distributed.': 1, 'Cash': 9, 'Equivalents:': 1, 'This': 1, 'any': 1, 'easily': 1, 'event': 1, "it's": 1, 'necessary.': 1, '4.': 1, 'Balance': 1, 'Sheet:': 1, 'balance': 1, 'sheet': 1, 'important': 1, 'financial': 1, 'statement': 1, 'communicates': 1, 'organization’s': 1, 'worth,': 1, '“book': 1, 'value.â€\x9d': 1, 'The': 1, 'includes': 1, 'tally': 1, 'liabilities,': 1, 'shareholders’': 1, 'equity': 1, 'for': 1, 'given': 1, 'reporting': 1, 'period.': 1, 'Sheet': 1, 'Equation:': 1, 'sheets': 1, 'arranged': 1, 'according': 1, 'following': 1, 'equation:': 1, '=': 1, 'Liabilities': 1, '+': 1, 'Owners’': 1, 'Equity': 2, '5.': 1, 'Capital': 1, 'Gain:': 1, 'capital': 1, 'gain': 1, 'increase': 1, 'value': 1, 'above': 1, 'price': 1, 'initially': 1, 'paid': 1, 'it.': 1, 'If': 1, 'sell': 1, 'less': 1, 'than': 1, 'original': 1, 'purchase': 1, 'price,': 1, 'would': 1, 'considered': 1, 'loss.': 1, 'Related:': 1, '6': 1, 'Ways': 1, 'Understanding': 1, 'Finance': 1, 'Can': 1, 'Help': 1, 'Excel': 1, 'Professionally': 1, '6.': 1, 'Market:': 1, 'market': 1, 'where': 1, 'buyers': 1, 'sellers': 1, 'engage': 1, 'trade': 1, 'including': 1, 'stocks': 1, 'bonds.': 1, 'markets': 1, 'feature': 1, 'several': 1, 'participants,': 1, 'Companies:': 1, 'Firms': 1, 'bonds': 1, 'investors': 1, 'Institutional': 1, 'investors:': 1, 'Investors': 1, 'who': 1, 'on': 1, 'behalf': 1, 'large': 1, 'base': 1, 'Mutual': 1, 'funds:': 1, 'mutual': 1, 'fund': 1, 'institutional': 1, 'investor': 1, 'manages': 1, 'investments': 1, 'thousands': 1, 'individuals': 1, 'Hedge': 1, 'hedge': 1, 'another': 1, 'type': 1, 'investor,': 1, 'controls': 1, 'risk': 1, 'through': 1, 'hedging—a': 1, 'process': 1, 'buying': 1, 'one': 1, 'then': 1, 'shorting': 1, 'similar': 1, 'make': 1, 'difference': 1, 'their': 1, 'relative': 1, 'performance': 1, '7.': 1, 'Flow:': 1, 'flow': 1, 'net': 1, 'moving': 1, 'out': 1, 'business': 1, 'specific': 1, 'point': 1, 'time.': 1, 'commonly': 1, 'broken': 1, 'three': 1, 'categories,': 1, 'generated': 1, 'normal': 1, 'operations': 1, 'Investing': 1, 'investing': 1, 'activities,': 1, 'securities': 1, 'sale': 1, 'Financing': 1, 'financing': 1, 'debt': 1, 'payments,': 1, 'equity,': 1, 'dividend': 1, '8.': 1, 'Flow': 1, 'Statement:': 1, 'prepared': 1, 'detailed': 1, 'analysis': 1, 'what': 1, 'happened': 1, 'during': 1, 'period': 1, 'document': 1, 'shows': 1, 'spent': 1, 'overview': 1, 'flows': 1, 'operating,': 1, 'investing,': 1, 'activities': 1, '9.': 1, 'Compound': 1, 'Interest:': 1, '“interest': 1, 'interest.â€\x9d': 1, 'Rather,': 1, 'saving,': 1, 'compound': 1, 'earned': 1, 'deposited,': 1, 'plus': 1, 'you’ve': 1, 'accumulated': 1, 'While': 1, 'it': 1, 'grow': 1, 'savings,': 1, 'debt;': 1, 'charged': 1, 'initial': 1, 'were': 1, 'loaned,': 1, 'well': 1, 'expenses': 1, 'added': 1, 'outstanding': 1, '10.': 1, 'Depreciation:': 1, 'Depreciation': 1, 'represents': 1, 'decrease': 1, 'asset’s': 1, 'value.': 1, 'It’s': 1, 'used': 1, 'accounting': 1, 'much': 1, 'has': 1}
if (len(', '.join(str(x) for x in finan_d.values() if x == 2)) > len(', '.join(str(x) for x in med_d.values() if x == 2))):
print("The document is related to finance")
else:
print("The document is related to medical")
The document is related to finance