Major Goal of Book
The book is similar to "Think Python: How to Think Like a Computer Scientist" but it more focused on data compared to algorithms and abstractions. Since there is more focus on data science, there is still more that can be learned from the "Think Python" book if more knowledge is wanted.
My goal is to reinforce some of the fundamentals that I skipped when I dove into the deep end at work getting a large (12,000 lines) program running. I learned a lot during that process, but I've found that some fundamentals have been missed and I've relied on searching for how to do things instead of knowing how to do them.
This chapter is mostly about what programming is and the introductory basics of how the syntax works. There are bits about how to install python and how to run "Hello World" as well as how to quit the terminal interpreter.
There is a bit that goes over the differences between high level languages like Python, Javascript, Ruby, etc. and machine language. The main feature of the high level languages is that they are able to be run on different sets of hardware without change because of interpreters in the middle that are able to create the machine code needed to run it on the specific system that it is actually used on. There is a difference between interpreters that read the source code of a program and executes them on the fly and a compiler that need to have the whole program as one file which can then generate machine language code which can be run.
Errors will pop up while programming and it is important to distinguish between:
- Syntax Errors: Errors that go against the rules of Python and cause an error during the runtime.
- Logic Errors: Errors that don't prevent the program from running, but do tasks in the wrong order and will create bad results.
- Semantic Errors: Errors that don't prevent the program from running and will appear to work correctly, but have wrong instructions (turn right instead of left)
Debugging will need to be done to fix the errors. There are four main things that can be tried to solve whatever issue there is:
- Reading: Looking at the code and reading it back to yourself to see if it matches what you wanted it to say.
- Running: Make changes to the code and run it to see if it will display a helpful error that is more obvious than the first one. This might require adding in some extra code like logging or printing to the terminal.
- Ruminating: Taking some time to think about what type of error you're dealing with and what information you can get from the error messages. What changes have been applied since it was last working and what areas could be impacted.
- Retreating: Undo some changes and get it back to a working state. From there it is possible to rebuild on a more solid foundation.
Beginners often get stuck on thinking of one type of process instead of trying to use the other ones. Sometimes it might be best to keep breaking the code down into smaller blocks so it can be easier to find errors and then fix them.
The learning process is a journey. It is ok if it takes time to really grasp the different concepts and commit them to memory. It can take time to go from just understanding the basics to writing full and complex programs.
Variables are words that hold a value in Python. They can't start with numbers or contain illegal characters like "@" or use Python's list of reserved keywords...
| and | del | from | None | True |
| as | elif | global | nonlocal | try |
| assert | else | if | not | while |
| break | except | import | or | with |
| class | False | in | pass | yield |
| continue | finally | is | raise | async |
| def | for | lambda | return | await |
Statements are lines of code that python can execute.
Operators are the mathematical symbols that can be used such as "+, -, *, /, and **". A list of operators can be found on the python doc page. The modulus "%" operator can be fairly useful since it will return the remainder after the first number has been divided by the second number. This can identify numbers that are divisible by each other when the remainder is 0. Operands are what the functions are being applied to.
Expressions is a combination of values, variables, and operators. Expressions will be evaluated and displayed while using an interactive terminal, but won't do anything in a script since the only output from a script must be explicitly called.
Python follows PEMDAS for Order of Operations as that is the mathematical convention. Parentheses should be added if there is any question about the order of operations as well as to help with readability.
Strings can also have some mathematical operations applied to them such as "+" which will concatenate two strings together or "*" which will multiply the string by however number of times.
The input function can be used to get a user input and will resume the code when the user presses the enter button. A string can be added in the middle of the function to have some text describe what is being prompted for.
Comments can be added in python by using the "#" mark. This can be used either at the beginning of a line or towards the end of one since everything after the "#" will be ignored. It can be good to let the code speak for itself when it can and use comments to describe why a bit of code is doing what it is doing. That also means that the code needs to be structured so that it easy to read and understand. Using mnemonic variable names will help aid in making the code readable from the start while comments can fill in the gaps of why is has been done the way it has.
Conditionals start by using different comparison operators and return either
True or False.
| Code | Meaning |
|---|---|
| x == y | x is equal to y |
| x != y | x is not equal to y |
| x > y | x is greater than y |
| x < y | x is less than y |
| x >= y | x is greater than or equal to y |
| x <= y | x is less than or equal to y |
| x is y | x is the same as y |
| x is not y | x is not the same as y |
* Note that there is a slight difference between == and is. == will
compare the values, but is will compare the whole object.
These conditionals can be combined by using and, or, and not. The meaning
and usage of them is the same as standard English.
As many functions end up needing some form of logic to process information,
these conditional statements are often used in if statements. They have the
below structure:
if x > 0:
print('x is positive')The if command is followed by a conditional of some kind and then ended with a
colon :. From there, the next line needs to be indented along with any other
line that is intended to be included in the if statement. This structure is
the same as for loops. pass can be used along with a comment if the logic
is just being implemented but the code isn't finished yet.
if statements can be extended with alternate executions by using else or
additional conditions can be chained by using elif. elif is a shorter
notation of "else if". The elif commands can only be after the initial if
command and else commands can only be at the end of a string of conditionals.
The conditions will always be checked in order and once one is true, the rest
will be ignored. The conditionals can also be nested as many times as desired
but that can start to become difficult to read. Ideally, code is kept to 3
indentations or less.
if x > 0:
print('x is positive')
elif x == 0:
print('x is 0')
else:
print('x is negative')The above elif statement will only be run if x <= 0 and the else statement
will only run if x < 0.
When functions are created, they can often throw errors if the wrong inputs are
given. This isn't really a concern when working in a live interpreter, but when
a script is being created, this can cause issues. try and except are
conditionals that are in python to allow the ability to catch errors and deal
with them. The are best used for user input areas to ensure valid inputs or for
calling external resources that might return unexpected values.
As python evaluates statements from left to right, if there is an and
statement where the first conditional is False, the second (or more)
conditional(s) will be discarded. This can lead to guardian patterns where
the first statement will block execution of a second (or more) statement. The
below code prevents an error from being thrown because y is confirmed to be a
0 before it can be used in a division throwing a ZeroDivisionError.
x = 1
y = 0
x >= 2 and y != 0 and (x/y) > 2
FalseFunctions are series of commands that perform some kind of computation. When a function is defined the name and the arguments that are needed in order to perform the computation.
There are several built-in functions in python that can be used. Those built in functions should be treated as reserved words and shouldn't be used.
| Function | Action |
|---|---|
| int() | Takes strings or floats and turns them into an int. Note that it just chops off the end of the number instead of rounding |
| float() | Converts strings or int into floating point numbers |
| str() | Converts it's arguments into a string |
There is also a built in math module that needs to be imported in order to work
by adding import math to the beginning of the python file. This will create a
module that is able to be used to perform different functions like
math.log10() or math.sin(math.pi).
Generally it is wanted for computer programs to return the same result every
time that the program is run making it deterministic. This is normally
good, but sometimes it is desired to have nondeterministic results by
introducing some randomness. Computers have a hard time actually generating
real random numbers, but they can generate pseudorandom numbers. Python uses
the built in module that needs to be imported random to do this. There are
multiple functions within this module that can be used to create random data.
While mostly built-in functions have been used so far. It is also possible to define new functions that can be used. A function definition specifies the name of a new function and the sequence of statements that are executed when that function is called. This helps to reduce code duplication
Example:
def print_lyrics():
print("I'm a lumberjack, and I'm okay.")
print("I sleep all night and I work all day.")def is a keyword that indications that a new function is going to be defined.
It will be followed by the name of the function, in this case print_lyrics.
Within the () the required arguments can be defined as needed but the is no
need to require any if they aren't needed like in this case. The first line of
the function definition is called the header and the rest is called the
body. The header must end with a colon and the body must be
indented. The general convention is to use 4 spaces for indentation. The
function will stop being defined when there are no more lines or there is
another line that is un-indented.
Functions can be defined in Python's interactive mode where it will change from
>>> to ... in order to indicate that the definition isn't complete. The way
to end the function in Python's interactive mode is by entering an empty line.
As Python is an object oriented language, when creating a new function, it will
also be a variable (object) with the name of the function. The syntax for
calling the function is the same for the built-in functions where the previous
function can be called by calling print_lyrics(). Since it can be called, it
can also be included in other functions like so...
def repeat_lyrics():
print_lyrics()
print_lyrics()To put the two code fragments together they would look like...
def print_lyrics():
print("I'm a lumberjack, and I'm okay.")
print("I sleep all night and I work all day.")
def repeat_lyrics():
print_lyrics()
print_lyrics()
repeat_lyrics()This code defines two functions and the order of the definition matters. In order for a function to be called, it must first be defined. The code inside won't be run until it is called, but the definition must exist first.
The flow of execution is essential to understand since that is the only way to know how your program will run. Python programs will always execute from the first line down, but function definitions will cause the lines inside of them to be skipped. This is important to know for being able to read the code as well as structuring it since reading strictly from the top down might not make sense if the function definitions aren't all lined up in how they are executed in the actual program.
Functions are called using arguments sometimes like math.pow(2, 3) which is
taking two arguments. Those arguments are then passed into the function and
assigned to variables called parameters.
def print_twice(bruce):
print(bruce)
print(bruce)Here this function takes one argument and assigns it to the variable bruce.
Whenever the function is called, it will print out the value of the parameter
twice no matter what it is so long as it can be printed.
>>> print_twice(17)
17
17
>>> import math
>>> print_twice(math.pi)
3.141592653589793
3.141592653589793We can still use the standard rules of composition by using any kind of expression as an argument for the function like...
>>> print_twice(math.cos(math.pi))
-1.0
-1.0The argument will be evaluated before the function is called so the expression
math.cos(math.pi) is only evaluated once. You can also use variables as
arguments. The name of the variable that is being passed into the function has
no impact to the variables inside of the function. The variable will be
assigned to whatever parameter the function has defined it to be. Outside
variables do not exist within functions (except for Class variables).
Some functions like math functions will yield results and could be described as
fruitful functions for this book. Other functions just perform an action like
print() and then exit without returning a value. Normally when a function
will return a value, it will be used later on as a new variable.
x = math.cos(math.pi)WHen a function is called in an interactive window, the result will be displayed, but in a script nothing will happen unless the result is stored. Void functions might display something or have other effect, but they don't have any return value. If you try to assign the result to a variable, it will only store none.
>>> x = print_twice(5)
5
5
>>> print(x)
NoneFrom Chapter 2, None is a
special type. Creating a function that will return a value requires the use of
an aptly named return statement. A basic function that would return the value
of two numbers added together could look like this.
def add_two(a, b):
added = a + b
return added
x = add_two(3, 5)
print(x)If the above script is run, it will print out 8 because x was assigned the value
of 8 from the return of the add_two function. The add_two function took the
arguments 3 and 5 which were added together and then passed to a return
statement.
Functions have several uses
- Creating functions helps to bundle statements making the program easier to read and debug
- Functions can eliminate duplicate code since it can exist in one place and be run many times. This also makes debugging or changing code easier since the code will only need to be changed in one location.
- Dividing a long program into functions makes it possible to debug or test one section at a time and then assemble the all of the functions into one working piece.
- Well designed functions are often useful for many programs. Once they have been written and debugged, it can be reused in other programs.
It is common to want to update a variable based on it's existing value such as
x = x + 1. In order for this statement to work in Python, x must already
exist since it is required to be known in order to calculate the new value.
This can be done by first assigning a value to it such as:
x = 0
x = x + 1
x += 1Since doing repetitive tasks is something that computers do very well, python has several features to make it easier.
One form is using while statements. They will continue to run so long as the
statement following while is true. See the example code block:
n = 5
while n > 0:
print(n)
n -= 1
print("Blastoff")It is possible to mostly read while statements like standard English. Just like
if statements, the code that is run within the loop must be indented. For
this example, it sets n to 5 and then evaluates if it is greater than 0.
Since this returns true, it will print the value and subtract 1. From there it
returns to the beginning and evaluates if n (now 4) is greater than 0. This
pattern continues until n=0 and the while statement returns false. At that
point it goes to the next un-indented line and prints out "Blastoff".
Every time that the while statement is evaluated is a new iteration and every
time that it returns to the initial while statement is a loop. It is
important to make sure that there is an end to the loop by some variable that
will change every loop which will make the while statement return false. The
variable that is changed can be called the iteration variable. If there is no
iteration variable or some other signal that will eventually cause an exit,
the loop will be an infinite loop
Infinite loops will go on forever unless there is a break statement called in
the middle of the loop somewhere. Code such as:
n = 10
while True:
print(n, end=" ")
n -= 1
print("Done!")Will run forever and never exit until the program is manually terminated somehow. Ctrl+C or closing the program are the two main ways to manually close programs that have ended up going on with no end. Building infinite loops with ways to end the loop can be useful.
while True:
line = input("> ")
if line == "done":
break
print(line)
print("Done!")This code will ask for user input until the word "done" is typed in. When the
word "done" is typed in, the if statement will return as True and then break
out of the loop.
Sometimes it might be desired to immediately end an iteration based on some
condition and immediately start a new iteration. The statement to do this is
continue. When continue is passed, the iteration will immediately stop and
loop back to the start of the loop.
while True:
line = input("> ")
if line[0] == "#":
continue
if line == "done":
break
print(line)
print("Done!")This code will run forever and echo out the input unless the input starts with
# where that will be skipped and another input will be requested.
If there is a need to loop through a list of things, using a for loop can be a
better option. These are called definite loops since the loop will last only
as long as the list of things to iterate through.
friends = ["Joseph", "Glenn", "Sally"]
for friend in friends:
print("Happy New Year:", friend)
print("Done!")As this example shows, the syntax for a for loop is very similar to a while
loop. This example created a list of friends and then for every iteration of
the for loop, one of the friends would be assigned to the variable friend
starting at friends[0] or "Joseph" and going until all of the friends had
gone through the loop. This takes 3 loops to accomplish in total.
It is important to note that for and in are reserved words in python. for
designates that a loop will be made and friend is the iteration variable
that will iterate through the different elements in friends.
Normally loops are used when there is a desire to go through a list of items or the contents of a file. The loops are generally started by:
- Initializing one or more variables before the loop starts
- Performing some computation on each item in the loop body, possibly changing the variables in the body of the loop
- Looking at the resulting variables when the loop completes
To count the number of items in a list a for loop can be used
count = 0
for iter_var in [3, 41, 12, 9, 74, 15]:
count += 1
print("Count: ", count)The statement is initialized with count = 0 so that the variable exists and
the starting value is 0. iter_var is the iteration variable that is used to
iterate through the loop. The value that it is stored as (the values in the
list) isn't being used in the code but it is available in the body of the for
loop. The body of the for loop is just using the count variable and adding
1 to it each loop. Once the loop is finished, the final value of count is
printed out.
A similar loop that would find the total of all of the values in the list would look like:
total = 0
for iter_var in [3, 41, 12, 9, 74, 15]:
total += iter_var
print(f"Total = {total}")This example uses the iteration variable, iter_var, by adding it's value to
total which was defined outside of the loop body so that there wouldn't be an
error when it was added to within the body. These code snippets aren't very
useful since the built in len() and sum() are able to perform the same
function. In general it is best to rely on the builtin functions of python
since they have had more optimization applied to them than creating a new method
of doing something routine.
To find the largest value in a list we can create a loop like so:
largest = None
print(f"Before: {largest}")
for i, iter_var in enumerate([3, 41, 12, 9, 74, 15]):
# enumerate will return the position as well as the value of a list in a for
# loop. The first value will indicate the position/loop number and iter_var
# will be the value at that point.
if largest is None or iter_var > largest:
# this will run only if either largest is None or if the current iter_var
# is more than the current largest value we have seen. Since the list
# starts with 3, largest will first be assigned to 3 and then when 41 shows
# up next that will be greater than 3 so largest will be reassigned. The
# next value 12 is less than 41 so largest will not be assigned.
largest = iter_var
print(f"Loop {i}: iter_var = {iter_var}: largest = {largest}")
print(f"Largest = {largest}")The variable largest can be thought of as the largest value that has been seen
up until that point in time. Before the loop, the value is set to None which
is a special value that is the same as "empty".
Changing this code to find the smallest only takes one small change (also changing the variable names for clarity).
smallest = None
print(f"Before: {smallest}")
for i, iter_var in enumerate([3, 41, 12, 9, 74, 15]):
if smallest is None or iter_var < smallest:
# The only real change here is changing the greater than sign for a less
# than sign. The logic is otherwise the exact same.
smallest = iter_var
print(f"Loop {i}: iter_var = {iter_var}: smallest = {smallest}")
print(f"Smallest = {smallest}")These bits of code are also not needed since Python has built-in max() and
min() functions that make hand writing loops unnecessary. A simple version of
the Python built-in min() function would look like:
def min(values):
smallest = None
for value in values:
if smallest is None or value < smallest:
smallest = value
return smallestThe print() statements have been removed and since there isn't a desire to
print out the loop number, enumerate has also been removed.
As you start writing larger and larger programs, there is probably going to be a need to spend more time debugging. More code means that there are more chances for an error to pop up. One way of debugging instead of going line by line which can become excessive is to start in the middle. This way you can know if there is an error in the first half or the second half of your program. This process can be repeated until you are able to narrow down the cause of the error. In theory, this would reduce a maximum 100 step process into a maximum 6 step process.
It isn't always clear where the middle point is or even if it's possible to check there, but this gives a process that is better than going line by line.
- accumulator A variable used in a loop to add up or accumulate a result.
- counter A variable used in a loop to count the number of times something happened. We initialize a counter to zero and then increment the counter each time we want to “count” something.
- decrement An update that decreases the value of a variable.
- initialize An assignment that gives an initial value to a variable that will be updated.
- increment An update that increases the value of a variable (often by one).
- infinite loop A loop in which the terminating condition is never satisfied or for which there is no terminating condition.
- iteration Repeated execution of a set of statements using either a function that calls itself or a loop.
A string is a *sequence of characters. Using a bracket operator, it is possible to access characters by their location.
>>> fruit = "banana"
>>> letter = fruit[1]
>>> print(letter)
aThe second statement in the above code extracts the letter at index position 1 and assigns that value to letter. Note that Python uses zero indexing so index position 0 would return "b".
len is a built-in function that returns the number of characters in a string.
>>> fruit = "banana"
>>> len(fruit)
6To get the last letter of a string it might be tempting to use
last = fruit[len(fruit)] but that would result in an IndexError since the
length of the variable is one larger than the index value. To get the last
letter it would be needed to subtract 1 from the result like
last = fruit[len(fruit)-1].
An alternative option would be to use native indices which count backward from
the end. fruit[-1] would yield the last letter while fruit[-2] would yield
the second to last letter.
There can be many reasons to process a string one character at a time. They
would normally start at the beginning, select each character and do something
and repeat until the end. This is called a traversal. One way to write a
traversal is by using a while loop:
index = 0
fruit = "banana"
while index < len(fruit):
letter = fruit[index]
print(letter)
index += 1This loop will traverse the variable fruit until the index is the same as the
length of the variable fruit. The last character printed here would be
fruit[len(fruit)-1] because the index is incremented after the call.
Another way to write a traversal loop is by using a for loop:
for char in fruit:
print(char)Each time through the loop, the next character in the string will be assigned to
the variable char and it will continue until there are no characters left.
Strings can be spit into a segment called a slice. Selecting them is similar to selecting a character but a range is given instead of a single index.
>>> string = 'Monty Python'
>>> print(string[0:5])
Monty
>>> print(string[6:12])
PythonThe operator [n:m] returns the string that starts at the n-th character to the
m-th character including the first, but excluding the last. If the first index
is omitted then the slice will stat at the beginning of the string but if the
second is omitted then it will continue to the end.
>>> fruit = 'banana'
>>> fruit[:3]
'ban'
>>> fruit[3:]
'ana'If the first index is greater than or equal to the second, the result is an empty string represented by two quotation marks:
>>> fruit = 'banana'
>>> fruit[3:3]
''An empty string has no length and contains no characters but other than that it is the same as any other string.