We have previously called this method
.lower() to turn all of the words lowercase, so that strings like “the” and “The” both become “the”, so we don’t double count them.
For example we can strip the affixes from words in a process called stemming. In the word “preprocessing”, there’s a prefix “pre — ” and suffix “ — ing” and the resulting word.
NLTK has several stemmers, you can make your own using regular expressions, but the NLTK stemmers handle many irregular cases.
There are 2 stemmers, Porter and Lancaster:
lancaster = nltk.LancasterStemmer()
[lancaster.stem(t) for t in tokens]# ['den'…
NLTK has preprocessed texts. But we can also import and process our own texts.
from __future__ import division
import nltk, re, pprint
!pip install urlopen
import urllib.requesturl = "https://www.gutenberg.org/files/11/11-0.txt"
raw = urllib.request.urlopen(url).read()type(raw)
# <type 'str'>len(raw)
// 'The Project Gutenberg EBook of Crime and Punishment, by Fyodor Dostoevsky\r\n'
tokens = nltk.word_tokenize(raw)type(tokens)
# <type 'list'>len(tokens)
# ['The', 'Project', 'Gutenberg', 'EBook', 'of', 'Crime', 'and', 'Punishment', ',', 'by']
Textization, or just turning it into NLTK’s Text Object so we run things like collocations:
text = nltk.Text(tokens)
# <type 'nltk.text.Text'>…
Previously, we talked about how languages are studied using the notion of a formal language. Formal language is a mathematical construction that uses sets to describe a language and understand its properties.
We introduced the notion of a string, which is a word or sequence of characters, symbols or letters. Then we formally defined the alphabet, which is a set of symbols. The alphabet often goes hand in hand with the language because we define a formal language as a set of strings over a unique alphabet.
Then we explored some operations on the string.
Then we explored some operations…
Work in Natural Language Processing typically uses large bodies of linguistic data. In this article, we explore some lexical resources that help us ingest and analyze corpora. These resources are part of Python or the NLTK library.
We can access pre-imported corpora in NLTK in one of 2 ways:
emma = nltk.Text(nltk.corpus.gutenberg.words('austen-emma.txt'))
or like this:
from nltk.corpus import gutenberg
emma = gutenberg.words('austen-emma.txt')
We can write a quick little script to display a bunch of standard language statistics like average word length, average sentence length, lexical diversity.
It turns out that average word length is a universal attribute of…
Okay so I’m gonna review After Effects a bit. Adobe’s very own tutorials are actually pretty good. Nice pace and nicely divided into small videos, with notes.
Design composition I think is a lot like photography composition.
Okay so let’s try to put everything into a remixed graphic.
I wanna mix a statue with a photo. And add these other things:
I was watching this video for 2021 design trends. And illustration or certain types of illustration were in trend. What was even more reassuring was that what was in trend seemed to be simpler things to draw.
It seemed doable.
And basically it’s very clear how real designers and digital artists do things these days: Adobe Illustrator and ProCreate.
So I thought I could learn ProCreate, then learn how to draw, then learn some Illustrator.
So two apps and these illustration related topics:
I looked up YouTube and SkillShare courses. Found these:
Why graphics design? Because basically everything I do could benefit from amazing design skills!
From video to websites and apps, making things look good is an essential skill for anyone doing any kind of creative and visual work.