Lecture 15 Slides - Binary Search Trees

1 of 51

Lecture 15

Binary Search Trees - Special Trees

2 of 51

Logistics

This Week

For some, Ethics Module 5 - Identity

  • Today - Binary Search Trees (MQ11) (Drop Deadline for F25)

All grades prior to Quiz 1 are locked in. All grades between Q1 and Q2 will lock-in the moment Q2 begins. Any discrepancies, need to be taken care of prior to Q2.

Next Week

For some, Ethics Module 6 - Impact

  • Monday - Q2 Review (MQ 12)
  • Wednesday - Quiz 2
    • Thursday - Ex 5 due (use Ex 5 as a way to study for Q2)
  • Friday - Wrapping up Functional Programming and Imperatives

3 of 51

Quiz 2 - 10/29 - Wednesday

  • In-person here in the Auditorium (ANU Students follow ANU Instructions)
  • Taken on the LDB on your personal computer; details on how to set it up are on Quiz 2 page (also extra practice available on that page).
  • During assigned class time; you MUST attend the one for which you're registered
  • You must be able to connect to eduroam
  • Three Types of Problems - focused on recursion:
    • What's the type of this expression?
    • Here's a program; here's what we want to get out of it; here's what we get; fix it
    • Write a valid test for the given function definition
  • Covers everything up to and including Lecture 15 (Binary Search Trees)
  • You do NOT need to memorize all of the functions (you get a glossary)
  • You DO need to know the Rules of Execution and Special Forms (all are fair game now)
  • You may bring 1 emotional support duck. They are forbidden from quacking.

4 of 51

Topics Overview for Q2

  • All the stuff from Q1
  • Recursion (functions can call themselves)
  • Iterative Recursion (recursion with an accumulator)
  • Inductive (Recursive) Data Definitions
  • Sussman Form (shorthand function definition)
  • cond
  • cons
  • Trees
  • Tree recursion (multiple recursive calls)
  • Binary search trees (a particular type of organized tree)

5 of 51

Binary Trees

  • A binary tree is a tree where every node has at most two children
    • Usually called the left- and right-child
  • Binary trees came up in pre–recorded lecture, the tutorial, and are in Exercise 5
    • Each defines them slightly differently because a “tree” is a concept or data structure – not a particular implementation
    • This is why the Inductive Data Definition is so important

6 of 51

Binary Trees

(define-struct binary-tree (data left right))

; A binary tree is...

; - an empty list or

; - (make-binary-tree any binary-tree binary-tree)

Pre-Recorded Lecture

7 of 51

Binary Trees

(define-struct binary-tree (data left right))

; A binary tree is...

; - an empty list or

; - (make-binary-tree any binary-tree binary-tree)

(define-struct branch (data left right))

; A binary tree is...

; - a number or

; - (make-branch number binary-tree binary-tree)

Pre-Recorded Lecture

Tutorial

8 of 51

Binary Trees

(define-struct binary-tree (data left right))

; A binary tree is...

; - an empty list or

; - (make-binary-tree any binary-tree binary-tree)

(define-struct branch (data left right))

; A binary tree is...

; - a number or

; - (make-branch number binary-tree binary-tree)

(define-struct human (name parentA parentB))

; An ancestry-tree is either

; - an empty list or

; - (make-human string ancestry-tree ancestry-tree)

Pre-Recorded Lecture

Tutorial

Exercise

9 of 51

Tree Recursion

(define (func args …) � (if easy-case?solve-easy-case (combine (func one-part) (func other-part)))))

  • You can often simplify the problem by splitting it
  • Then the fixup step consists of combining the answers to the two problems (for binary trees) or many (n-ary trees)
  • This is called tree recursion
    • “I can solve this problem by running this same function on the left branch of this binary tree, running this same function on the right branch of this tree…and then combining the answers”

10 of 51

Binary Search Tree

  • A tree is a binary tree iff [if and only if] each node has at most two children. A binary tree is a binary search tree if it has the following invariant:
    • All nodes in the left sub-tree have a smaller value than the parent node.
    • All nodes in the right sub-tree have a larger value than the parent node.

  • An invariant is just a constraint that always holds for a given data type.

11 of 51

An Example

Why is this a binary search tree?

  • All nodes in the left sub-tree have a smaller value than its parent node.
  • All nodes in the right sub-tree have a larger value than its parent node.

12 of 51

An Example

13 of 51

Bigger Example

14 of 51

Bigger Example

This is the root. It has left and right sub-trees.

15 of 51

Bigger Example

This is the root. It has left and right sub-trees.

Right sub-tree has value greater than parent.

16 of 51

Bigger Example

This is the root. It has left and right sub-trees.

Right sub-tree has value greater than parent.

Left sub-tree has value less than parent.

17 of 51

Bigger Example

This is the root. It has left and right sub-trees.

Right sub-tree has value greater than parent.

Left sub-tree has value less than parent.

This value can be greater than 7 and 9…but can't be greater than 12!

18 of 51

Bigger Example

This is the root. It has left and right sub-trees.

Right sub-tree has value greater than parent.

Left sub-tree has value less than parent.

This value can be greater than 7 and 9…but can't be greater than 12!

This one can't be 8, 9, or 10 because then it would be greater than 7!

19 of 51

Wait…why would this matter?

Why do we care about keeping our data structured?

Why not just smoosh it all together?

Who cares what order it's in.

This is bogus.

Just put it in a list.

Why are we talking about this on a friday of all days.

20 of 51

You're Running the Tech Stack for HR at Walmart

  • You have over 2 million employees in the main employee database.
  • That doesn't even include your supply chain employees; your delivery employees; your tech contractors; etc.
  • You need to maintain a "database" (a collection of data) about all of them.
  • Someone asks you to get one particular employee's information out of this database.
  • How do you do it?

21 of 51

Some Simplifications

  • The only data you store about your employees is: their social security number (an identification number that is unique) and their name.
  • Step 1. Figure out a way of representing each employee.
  • Step 2. Figure out a way of storing a bunch of those employees.
  • Step 3. Write a function to lookup an employee given some social security number.

22 of 51

Using struct to create an employee

; an employee is…

; - (make-employee number string)

(define-struct employee (ssn name))

Remember, this gives us:

; constructor: …

; predicate: …

; accessors/selectors: …, …

23 of 51

Woo. Step 1 is done!

(make-employee 1 "ava")

(make-employee 2 "george")

(make-employee 3 "jessica")

(make-employee 4 "steve")

Now how do we store a bunch of these things…

24 of 51

Version 1 - Inductive Data Definition

An employee database is simply a list of people.

In other words, a database is either:

  • empty
  • (cons employee database)

25 of 51

Adding a person to the database

An employee database is simply a list of people.

In other words, a database is either:

  • empty
  • (cons employee database)

Write a function (add employee database) that returns a database with that employee added to the database.

26 of 51

Adding a person to the database

(define (add employee db)

…)

(define ava-e (make-employee 1 "ava"))

(define jessica-e (make-employee 3 "jessica"))

(check-expect (add ava-e '()) (list ava-e))

(check-expect (add ava-e (list jessica-e)) (list ava-e jessica-e))

27 of 51

Adding a person to the database

(define (add employee db)

(cons employee db))

(define ava-e (make-employee 1 "ava"))

(define jessica-e (make-employee 3 "jessica"))

(check-expect (add ava-e '()) (list ava-e))

(check-expect (add ava-e (list jessica-e)) (list ava-e jessica-e))

28 of 51

Looking up a person to the database

; lookup-v1: number database -> employee or false

; find the person in the database via their ssn

; false if not found

29 of 51

Looking up a person to the database

; lookup-v1: number database -> employee or false

; find the person in the database via their ssn

; false if not found

(define (lookup-v1 n db)

...)

(define test-db (list jessica-e ava-e george-e steve-e))

(check-expect (lookup-v1 2 test-db)

george-e)

(check-expect (lookup-v1 2 empty)

false)

(check-expect (lookup-v1 5 test-db)

false)

30 of 51

Looking up a person to the database

; lookup-v1: number database -> employee or false

; find the person in the database via their ssn

; false if not found

(define (lookup-v1 n db)

(cond [(empty? db) false]

[(= n (employee-ssn (first db))) (first db)]

[else (lookup-v1 n (rest db))]))

31 of 51

Visualization of Mushroom Kingdom

32 of 51

Looking up a person to the database

; lookup-v1: number database -> employee or false

; find the person in the database via their ssn

; false if not found

(define (lookup-v1 n db)

(cond [(empty? db) false]

[(= n (employee-ssn (first db))) (first db)]

[else (lookup-v1 n (rest db))]))

If we had 100 employees:

  • Best Case: We find the employee right at the beginning of our db – 1 lookup
  • Worst Case: We go through the entire db and don't find them – 101 lookups (n + 1).
  • Average Case: The employee is in the middle of the list - 50 lookups (n / 2).

33 of 51

Can we do better?

What if we kept this whole list sorted by ssn…

34 of 51

Version 2

An employee database is a sorted list of people.

In other words, a database is either:

  • empty
  • (cons employee database) ; such that the list is sorted

35 of 51

Version 2 - Inductive Data Definition

An employee database is a sorted list of people.

A database is either:

  • empty
  • (cons employee[e] sorted-list[l])

And has the invariant: each ssn number in l is larger than the ssn of employee e

36 of 51

Looking up a person to the database

; lookup-v2: number database -> employee or false

; find the person in the database via their ssn

; false if not found

37 of 51

Visualization of Mushroom Kingdom

38 of 51

Looking up a person to the database

; lookup-v2: number database -> employee or false

; find the person in the database via their ssn

; false if not found

(define (lookup-v2 n db)

...)

(check-expect (lookup-v2 2 (list ava-e george-e jessica-e steve-e))

george-e)

(check-expect (lookup-v2 2 empty))

(check-expect (lookup-v2 5 (list ava-e george-e jessica-e steve-e))

false)

(check-expect (lookup-v2 1 (list george-p jessica-p steve-p))

false)

39 of 51

Looking up a person to the database

(define (lookup-v2 n db)

(cond [(empty? db) false]

[(< n (employee-ssn (first db))) false]

[(= n (employee-ssn (first db))) (first db)]

[else (lookup-v2 n (rest db))]))

40 of 51

Looking up a person to the database

(define (lookup-v2 n db)

(cond [(empty? db) false]

[(< n (employee-ssn (first db))) false]

[(= n (employee-ssn (first db))) (first db)]

[else (lookup-v2 n (rest db))]))

If we have 100 employees…

  • Best Case: We find the employee right at the beginning of our db – 1 lookup or the ssn is smaller than the one first.
  • Worst Case: We go through the entire db and don't find them – 101 lookups (n + 1).
  • Average Case: The employee is in the middle of the list - 50 lookups (n / 2).

Oh wait…WHAT. All that work and no benefit?!?!?

41 of 51

Can we do better?

What if we put this in that whole binary search tree thing you talked about a little while ago…

42 of 51

Version 3

An employee database is a sorted tree of people.

In other words, a database is either:

  • empty
  • (make-db-node employee[e] db-left db-right)

(define-struct db-node (employee left right))

  • Invariant: every person in 'left' has a smaller ssn than person 'e' and every person in 'right' has a larger ssn than person 'e'

43 of 51

Visualization of Mushroom Kingdom

44 of 51

Looking up a person to the database

; lookup-v3: number database -> person or false

; find the person in the database via their ssn

; false if not found

(define (lookup-v3 n db)

; what do you do if the db is empty?

; what do you do if the person at the root (db-node-employee db)

; has the ssn that we're looking for?

; what do you do if the number we are looking for is less than

; the person's at the root?

; what do you do if the number we are looking for is greater than

; the person's at the root?

)

45 of 51

Mega Mushroom Kingdom

46 of 51

Looking up a person to the database

; lookup-v3: number database -> person or false

; find the person in the database via their ssn

; false if not found

(define (lookup-v3 n db)

(cond [(empty? db) false]

[(= n (employee-ssn (db-node-employee db))) (db-node-employee db)]

[(< n (employee-ssn (db-node-employee db))) (lookup-v3 n (db-node-left db))]

[(> n (employee-ssn (db-node-employee db))) (lookup-v3 n (db-node-right db))]))

47 of 51

Looking up a person to the database

; lookup-v3: number database -> person or false

; find the person in the database via their ssn

; false if not found

(define (lookup-v3 n db)

(cond [(empty? db) false]

[(= n (employee-ssn (db-node-employee db))) (db-node-employee db)]

[(< n (employee-ssn (db-node-employee db))) (lookup-v3 n (db-node-left db))]

[(> n (employee-ssn (db-node-employee db))) (lookup-v3 n (db-node-right db))]))

  • Best Case: The employee we want is the root of the tree – 1 lookup.
  • Worst Case: The employee we want is at the bottom of the tree - how many times do we need to divide by 2? (lg n) (this is log base 2). (lg 128) = 7, (lg 256) = 8, (lg 512) = 9…
  • Average Case: The math here is harder…we'll save it for CS 212 and 214!

48 of 51

Linear Search vs. Binary Search

49 of 51

Mathematical Notation

Class Name

When the input doubles...

Constant

Runtime is unchanged ( ... x 1 )

Logarithmic

Runtime increases by a constant

Linear

Runtime is doubled ( ... x 2 )

Quasilinear

Runtime is approximately doubled

Quadratic

Runtime is quadrupled ( ... x 4 )

Cubic

Runtime is octupled ( ... x 8 )

Exponential

Runtime gets...really big

Common Big O Classes

50 of 51

Mathematical Notation

Class Name

When the input doubles...

Constant

Runtime is unchanged ( ... x 1 )

Logarithmic

Runtime increases by a constant

Linear

Runtime is doubled ( ... x 2 )

Quasilinear

Runtime is approximately doubled

Quadratic

Runtime is quadrupled ( ... x 4 )

Cubic

Runtime is octupled ( ... x 8 )

Exponential

Runtime gets...really big

Common Big O Classes

Linear Search

51 of 51

Mathematical Notation

Class Name

When the input doubles...

Constant

Runtime is unchanged ( ... x 1 )

Logarithmic

Runtime increases by a constant

Linear

Runtime is doubled ( ... x 2 )

Quasilinear

Runtime is approximately doubled

Quadratic

Runtime is quadrupled ( ... x 4 )

Cubic

Runtime is octupled ( ... x 8 )

Exponential

Runtime gets...really big

Common Big O Classes

Binary Search