This file contains the in-class lecture notes related to Chapter 1 of Baase.

Some of the class information contained in this file, including directions
for getting email announcements and the essential syllabus info, are on-line at:
http://edoras.sdsu.edu/~carroll/cs460home.html

Day One: There is a list circulating in the room for you to sign up on;
MAKE SURE your name is on it before you leave the room.
(All prospective students, not just registered students, should sign this list.)

For CS460, the prerequisites are:
CS210 (Data Structures, formerly CS310) and Math245 (Discrete Mathematics).
I may also unregister students who have not attended class during the first
week, to make room for crashers.

I will try to accommodate most students, giving priority to the
'most desperate' (e.g., if you can prove you can finish your Bachelor's
degree this semester).  If there is a long waitlist, it will be a while
before I can give definitive answers on who will be accepted.  In that case,
I'll probably have you turn in a short assignment after the first week of
classes, and those who successfully complete it may be deemed to have met
the prerequisites and be allowed to enroll [even if you do not officially
have Math245 and CS210 on your transcript].

I don't expect to use the Canvas website very much for this class.
In particular, if you email me using the Canvas class site, you almost
certainly won't get as prompt an answer as mailing me directly at
carroll@edoras.sdsu.edu .

[Not-yet-registered students should email me at carroll@edoras.sdsu.edu as
well, so that I know that you exist.  That will help me ensure you get all
the announcements, help, and hints that the enrolled students are getting.]

The syllabus [and exams] are the only paper documents I will distribute
this semester.  Just in case you lose the current piece of paper,
the electronic form of the syllabus can be found under
http://edoras.sdsu.edu/~carroll/
(follow the link there for CS460)

The web page will be updated each semester, but the info for Fall 2024 is:

Text: Computer Algorithms: Introduction to Design and Analysis, by
Sara Baase and Allen Van Gelder ISBN: 0201612445

Notes: A collection of the annotated programs and diagrams presented in class.
Available at Cal Copy (5187 College Ave, 619-582-9949)

Course Content:
Algorithm analysis, induction, recursion and recurrence relations,
sorting and selection, graph algorithms, introduction to NP-completeness
and approximation algorithms.

Prerequisites:
CS310 (Data Structures) and Math245 (Discrete Mathematics).  You must
know the material in these courses, and the courses they depend on
(e.g., Calculus), or you will be lost.

Grading:
Most assignments will involve mathematical analysis, and hence be turned
in much as you would in any math class.  (It does NOT have to be typeset,
but it must be clear enough for me to follow.)  Assignments will comprise
1/4 of your grade, the final will be worth 1/2, and the midterm will
account for 1/4 of your grade.
You can find out in advance when all your finals are, including this one.
For Fall 2024, for example, the final schedule for SDSU is listed at:
https://registrar.sdsu.edu/calendars/finals/fall-2024
...so just look up the date/time for this class (and your other classes).

Letter grades:
90% and above is guaranteed at least an A-
80% and above is guaranteed at least a  B-
70% and above is guaranteed at least a  C-
60% and above is guaranteed at least a  D-

Policies:
Homework and programming assignments are intended to help you learn.
Talking over your ideas and problems with other people in the class is
very helpful.  You may discuss ideas, but you must do your own work and
write up your own solutions and programs.  In particular, you may NOT work
on a program or assignment as a team.

Using another person's work is cheating.  Copying a program from a book
is plagiarism, just like copying from a paper for a humanities class,
unless you give an appropriate citation.  If you are in doubt about where
the border line is for our assignments, ASK ME.

It should go without saying (but past experience suggests saying it) that
copying on exams, homework, or other forms of cheating will be dealt with
severely.  Automatic failure of the course is guaranteed in such cases,
and sanctions can include expulsion from the university.  If an assignment
is copied (or developed as a team), BOTH parties will fail the course (so,
if someone asks to copy your work, point them at this paragraph :-)

Your assignments are due at the beginning of class on the day specified
on the assignment.  To maintain fairness and uniformity of grading, I cannot
accept late assignments.  Similarly, there will be no make-up exams except
in unusual circumstances (to be determined by me).  If you know in advance
that you will miss an exam, see me about it in advance.  Note the date of
our final exam now; don't make plans that conflict with the final.
Note in particular that the university policy described in
https://registrar.sdsu.edu/calendars/finals/fall-2024
prohibits taking the final early.

Though we will be deeply immersed in computer science topics, this course will
feel more like a math course than a computer science course.  In particular,
there will be NO programming assignments.  Most of your homework (and exam
problems) will involve inventing rigorous proofs, and working through the
details of algorithms 'by hand' to fully understand how they work.  [The
algorithms we will consider are all well-known, and a web search will quickly
turn up usable code if you need to implement them in the future; thus, we will
concentrate on understanding, not coding.]

Reading:
Chapter 1 (skip section 1.2, for now at least)
We will skip Chapter 2, so Chapter 3 will be next.

You undoubtedly have at least a vague idea of what the term 'algorithm' means;
roughly, it's a specification of step-by-step instructions that (given some
input, like a scrambled list of numbers) leads to an answer to the proposed
question (such as "What does the sorted version of these numbers look like?").
We are rather picky: we want a *correct* answer, no matter which input we feed
the algorithm.  (Note that this implies that the carried-out instructions can't
go on forever; for any input, the algorithm HAS to return an answer.)  If the
'algorithm' sometimes gets irretrievably stuck in a loop, or sometimes
returns a wrong answer, then we cannot call it an algorithm.

One surprising result (from almost a century ago) is that there are some
problems for which no algorithm can exist.  One such [quite reasonable]
question that is impossible to reliably answer is:
"Given an arbitrary computer program, will this program halt on all inputs?"

It's not that no one has been smart enough to write a true algorithm for this
question; if you assume the existence of such an algorithm, it leads to logical
contradictions -- so no one is trying to solve this, since such an algorithm
CANNOT exist.  So there are limits on what can be done algorithmically.

But there are plenty of things we *do* know how to do well, and that's what
the first part of this course is about.  (After which, we'll spend about a month
considering 'hard' problems for which we have solutions, but not good solutions.

Section 1.3, Page 12:
This section covers things that you should already know; they should have been
topics in your Discrete Mathematics class.  Make sure you understand these basic
concepts.  Here are some of the things to which you should pay particular
attention, since they will be used in the chapters we will cover this semester:

Combinations (Page 13) are related to permutations (Page 16).  Make sure
you understand the logic behind the formulas.

Tuples and cross-products should be familiar to you.
Page 14: note that Definition 1.2 only makes sense for AxB if A=B (=S)

Section 1.3.2, Page 15:
We will use all this stuff -- note the lg notation at the bottom of the page.

Page 16: Probability -- you should have seen this in Discrete Math, and perhaps
in a probability or statistics course as well.
Page 18:
The "In general..." paragraph gives a good sense of conditional probability.
Definition 1.5, Stochastic Independence
Consider drawing a card from a 52-card deck.  The probability of drawing an
Ace is 4/52 (since there are four aces in the deck, and all cards have an
equal chance [1 out of 52] of being picked).  Now consider drawing a second
card and laying it beside the first one.  If there are 51 cards left, then
each card has a 1 in 51 chance of being picked.  Drawing an Ace this second
time is no longer an independent event, since the probability will depend
on what the first card drawn was.  (If you picked something other than an
Ace the first time, then the next probability is 4/51; if the first card was an
Ace, then there are now only 3 other aces left, so the probability is 3/51.
Therefore, the probability for this second draw is NOT independent of the
first event -- it matters what the first card was.)

By contrast, if our 'experiment' consists of drawing a card, then replacing
it in the deck, shuffling, and again drawing one of the 52 shuffled cards, these
ARE a independent events -- the probability of getting an Ace this second time
is 4/52, regardless of which card was picked the first time.

Page 19: Definition 1.6 Expected value
This is sometimes called 'average value', though note that it is a *weighted*
average.  Here's an example, based on rolling a pair of (six-sided) dice:
The probabilities of rolling each sum are:
2: 1/36   5: 4/36   8: 5/36   11: 2/36
3: 2/36   6: 5/36   9: 4/36   12: 1/36
4: 3/36   7: 6/36  10: 3/36
The expected value of the sum is then
2*1/36 + 3*2/36 + 4*3/36 + 5*4/36 + 6*5/36 + 7*6/36 + 8*5/36 + 9*4/36 +
10*3/36 + 11*2/36 + 12*1/36 = (2+6+12+20+30+42+40+36+30+22+12)36 = 252/36 = 7

Lemma1.2: note the 'one or the other' strategy
Page 24: Note that Figure 1.3a is monotonic but not convex, and 1.3b is
convex but not monotonic.

Page 26: Figure 1.4 shows a neat trick we will need to use, as on more than one
occasion we will have to add up things like log(1)+log(2)+log(3)+...+log(n),
which is at best tedious for a particular n, and impossible to get an exact
expression for in terms of an arbitrary n.  But as the figures show, the answer
will be approximately the integral of the smooth curve, and it's easy to
integrate functions like log(x).  In particular, we will need the result of
Example 1.8 on page 27.

Section 1.3.3, Page 28: the concepts and equivalences in this section should
be familiar from your Discrete Mathematics course; review them carefully,
as we will use them over and over this semester.

Announcements:
Page xiii of the Preface shows the web page addresses associated with
the text.  I recommend going to
https://users.soe.ucsc.edu/~avg/Supplements/web-errata.pdf
...and downloading this errata file, and making the indicated corrections
to the text.  Depending on what printing you happen to have, you may find
that some of the corrections are already incorporated into your textbook.

Section 1.4, Page 30: New stuff (finally!)

Algorithm analysis: 5 criteria are given on Page 30.  For the moment, we will
concentrate principally on the amount of work done.  The five criteria are:

Correctness: an algorithm is of little use if it sometimes gives wrong
answers! (Exception: approximation algorithms)
preconditions/postconditions -- see definition on Page 30
example: binary search (phone book search only works because it is sorted).

Amount Of Work Done:
We need a good way to 'measure' this, and it's hard to say this precisely
in a definitive way.  We want a measure that is independent of the hardware,
programming language, programming style, and 'bookkeeping operations',
so just counting instructions or measuring computer time is a bad solution.
(And it makes the analysis almost impossibly hard.)

We instead concentrate on a 'fundamental operation', and count only those
operations when analyzing or comparing algorithms.  The concept is best
understood by looking at several examples.

Page 32: different problems have different fundamental operations

For matrix multiplication, scalar multiplication is the obvious basic operation
For a more complex analysis of matrix multiplication, one might count
all additions, subtractions, multiplications, and divisions, instead of
just multiplications.  (And sometimes this matters -- one of your homework
problems shows that in some special circumstances, a clever algorithm can
'trade' a multiplication for a few additions.  In the homework problem,
you might try tricks like taking (a+b)*c, and then use addition and
subtraction to combine these partial results.  (a+b)*c is equivalent to
a*c + b*c, but the former involves one fewer multiplication.)

Page 34: To analyze the worst-case complexity of a problem, we need to compute
the number of basic operations we have to do in the worst case.  To do this,
we have to discover the most uncooperative input to feed to our algorithm.

Naturally, the answer depends on n, the input size
(the bigger the problem, the longer it is likely to take to find the answer).
This will give a lower bound on the complexity, and allow us
to make 'guarantees' about how long it will take to solve any problem.
(A handy thing to know if you have to compute, for example, the trajectory of
a missile headed toward you.)

Page 35: Often, average-case complexity is a more useful measure (for example,
if you're searching an airline database, you probably care more about
average performance than worst-case performance).  This is generally
more difficult, since you have to consider ALL possible inputs, rather
than focusing on one particular 'worst' input.

Page 35: The text applies these two definitions [worst-case and average-case]
to a simple problem: searching an unsorted array.

Page 39, 1.4.6: Optimality
Optimality measures the inherent complexity of a *problem* -- so far we've
only been concerned with the specific complexity of a particular *algorithm*.
For a given class of algorithms that solve a problem, we want to know how
many 'basic operations' are actually *needed* to solve it. -- Proving that
your algorithm is better than anyone else's algorithm does NOT mean that
it is optimal -- there may be some as-yet-undiscovered algorithm that may
be better.

Note that they are talking about the fewest 'basic operations' IN THE WORST
CASE; it is much harder to discuss optimality in terms of the average case!

Page 39, 1.4.6: Lower/Upper Bounds
1) If a given algorithm has been proved correct, then its complexity W(n) gives
a [possibly poor] upper bound for the complexity of the corresponding problem.

2) If you can prove a theorem that says that a given problem requires
that every [correct] algorithm perform at least F(n) 'basic operations'
[for some function F], then you have a lower bound for the complexity of
the corresponding problem.

If W(n) = F(n), then the algorithm in (1) is optimal; otherwise, we need
to be smart enough to be able to prove a better theorem, or smart enough
to devise a better algorithm before we can say with certainty what the
complexity of the corresponding problem actually is.

The book has two examples in this section; one proof of optimality, and
one case where we're not smart enough to fully analyze the problem.

Page 41: Example 1.12, Matrix Multiplication: To multiply two nxn square
matrices, the 'standard' algorithm needs n^3 [scalar] multiplications.  Thus,
n^3 is an upper bound for this problem, but it has been shown to be a rather
poor upper bound.  There is an algorithm that does n^2.376 multiplications,
and that is the best known upper bound at present.  The best known lower
bound is a theorem that states that matrix multiplication needs at least
n^2 multiplications.

So, at present we don't know if there is an undiscovered algorithm that
can do the job using only n^2 multiplications, or whether the n^2.376
algorithm will turn out to be the best.  (Perhaps neither is the case;
the optimal algorithm might turn out to be somewhere in the middle,
with complexity between n^2.376 and n^2.)

Page 40-41: Example 1.11 
Finding the largest array element requires n-1 comparisons.  READ this
example! I will instead do a similar problem, based on our previous example,
Example1.9/Algorithm1.1 (Page 35)

Algorithm1.1 turns out to be optimal; we have already shown that this
*algorithm* requires n comparisons [in the worst case], and we can prove
that the *problem* also needs n comparisons [in the worst case].  Hence the
upper and lower bounds are identical, so Algorithm1.1 is optimal.

To prove this, we have to show that any algorithm that solves this problem needs
at least n 'basic operations' [comparisons] to get the correct answer.  (Some
inputs might require fewer comparisons, but remember that we are considering
worst-case behavior, so just looking at a restricted set of inputs doesn't
prove anything.)

The analysis of Algorithm1.1 gives an upper bound on the complexity; note
that when looking for a lower bound, it is inappropriate to be looking at
features of Algorithm1.1; we must analyze the problem, not any specific
algorithm.  Most algorithms will probably compare the searched-for element
with elements in the array, but we can't assume that is the case, since we
must consider ALL possible algorithms for this problem (some algorithms might
find some clever way to at times compare two array elements to each other,
for example).

To show that n is a lower bound for this problem, we will argue that any
'algorithm' that does fewer than n comparisons will [at least occasionally]
give the wrong answer.  Since we don't know anything about how the algorithm
works, this can be hard to prove; but luckily, we only have to find ONE input
that gives the wrong answer.  (Because unless an 'algorithm' gives the correct
answer for every possible input, it's not really an algorithm.)

The argument proceeds in a similar fashion to the one in the book for
Algorithm 1.3 -- for our new problem, each comparison eliminates at most one
array element as a potential match for the searched-for element.  We can
argue that if less than n comparisons are done, then we can construct a
data set [that is, an input] for which the algorithm fails.

Perhaps the simplest way to prove this is to begin with a data set which
does not contain the searched-for element.  Any 'optimal' algorithm that
always does fewer than n comparisons will therefore halt after doing n-1
[or fewer!]  comparisons.  If the algorithm is indeed correct, it will
report that the searched-for element is not in the array (or else the
algorithm gives an incorrect answer, in which case it is definitely
NOT an algorithm that solves this problem!).

The next part of the proof is to show that we can tweak this data set so that
the proposed algorithm still proceeds in the same way, and reaches the same
conclusion (that is, that the element is 'not found') -- and do it in such
a way that this is now the wrong answer for the tweaked data set.  (If the
algorithm always compares an array element to the searched-for element,
this is easy to prove; one of the array elements will not have been compared to
anything, and we can change the value of this element to match the searched-for
value, and then the proposed algorithm will incorrectly still conclude
that the element is 'not found' -- only with this new data set, that will
be the wrong conclusion.  If the algorithm instead sometimes compares one
array element to another array element, it is a bit trickier to argue that
you can tweak the data set in such a way that it still proceeds along the
same decision path, and comes to the wrong conclusion, but it can be done.)

For a slightly different class of inputs, Algorithm1.1 might NOT
be optimal.  If the value we are searching for is GUARANTEED to be one
of the elements in the array, then we do not need n comparisons; if we
have done n-1 comparisons without finding a match, then an intelligent
algorithm can deduce that the correct index is the [only] slot that has
yet to be examined, and thus avoid the nth comparison.

Similarly, if the class of inputs is restricted to arrays that are
already sorted, then Algorithm1.1 is definitely not optimal.  (Algorithm
1.4 on Pages 55 and 56 is much more efficient.)

Page 42, 1.4.8: Implementation details...
...are often needed to correctly analyze the algorithm [both time and space].
In the set example on Page 42, note that the array solution is good for
searching, bad for unioning; the linked list solution is good for unioning,
but bad for searching.  So, it's hard to say which data structure might
be better for sets, since it depends on what kind of operation you need
to perform on the set.

Page 43: taking advantage of features of the hardware.
The VAX, for example, had an assembly language instruction that could
evaluate an entire polynomial, all in one instruction.  Normally, we would
count multiplications [or multiplications and additions] when evaluating
an algorithm that had to deal with polynomial evaluation, but perhaps
for such VAX programs, the basic operation should instead be the single
assembly language instruction.

Page 43, 1.5 Asymptotic growth rates
The 'city' discussion on Page 44 is well worth reading.

We are interested in characterizing growth rates based on what happens to
large values (in our context, for large values of n, where n is the size
of the input fed to the algorithm under consideration).

Page 45: Def1.14, Lemma1.5
We don't care about small values of n, and we don't even care about constant
factors.  That is, f(x)=2x and g(x)=100x are both linear functions, and
hence of the same 'order'.  h(x)=x^2 and g(x)=100x are NOT of the same
order -- h(x) grows faster (even though g(x) has much larger values than
h(x) does until x gets very large).  The important feature here is that
if you double the size of x, g(x) becomes twice as large.  However, when
you double the size of x, h(x) becomes four times as large, and hence h
grows faster than g, and is therefore considered to be a different 'order'.

Page 45: Figure 1.5

Page 49: 1.5.2
How much larger can a problem become before it takes four times as much
effort to solve?  Let's start with n=100, and examine some functions.

For a linear function, (such as the ones for Algorithm 1.1 or 1.3,
SequentialSearch or FindMax), we can go from n=100 to n=400 to make it
four times as hard.

For quadratic [n^2] functions (such as the one associated with some sorting
algorithms), we can only double the size of the problem before it becomes
four times as hard (that is, we go from n=100 to n=200).

For exponential problems [2^n], we can only go from size n=100 to size n=102
-- each time you add 1 to n, you double the amount of work!  (And n=98
was 4 times easier than n=100, etc.)  As you might imagine, exponential
problems are quite intractable -- you quickly reach the point where your
hardware is overwhelmed by even small increases in the problem size.

We will study such intractable problems in Chapter 13.  Note that when
it comes to asymptotic growth, n^1000000 grows SLOWER than 2^n does.
Exponential growth beats polynomial growth every time.

Page 52, 1.5.4, Theorem 1.13:
We will find occasion to use these formulas at various points in the text.

Page 53, 1.6: Searching an Ordered Array
Note last paragraph, Page 53.

1.6.1: The exposition on Pages 54 and 55 is excellent; important points
are made with each refinement, so read this section carefully.

Algorithm 1.1 can 'quit early' when the key is matched to the array entry,
and that is why the 'average-case complexity' was smaller than just 'n'.
For the first improvement to the algorithm, we find a way to 'quit early'
even in some cases when the key is not in the array, which leads to
improvements to the formulas we developed on Pages 36 and 37.  Note that
we really do need the elements in the array to be sorted in order for this
new method to always yield the correct answer.

The modified algorithm improves the average-case complexity, but does not
help with the worst-case complexity.  The next improvement [top of Page
55] makes an improvement to both the average-case and the worst-case.
Paragraph 2 on Page 55 argues that in the worst-case, the cost is now on
the order of the square root of n, which is significantly better than the
linear cost we had in previous algorithms.

Further improvements lead to the Binary Search Algorithm (Algorithm 1.4, Page
55/56), which affords our first look at the 'divide-and-conquer' strategy.
In this case, we break up our original problem into two smaller problems,
and then reapply the strategy to the smaller problems, and continue in this
manner, dividing up the problem until we find a problem so small that the
answer is immediately obvious.

Page 56: 1.6.2
The text mentions 'three-way branch', which means a single comparison which
returns one of three results: =, <, or >.  Hence, we really don't do a
comparison on line 5 and then another one on line 7 -- a single comparison
is sufficient.  Hence, we do one [three-way] comparison for each recursive
call [except for the last call].  Besides these comparisons on lines 5 and 7,
there is another line of Java code [line 1] that involves '<'.  As mentioned
in the first paragraph of 1.6.2, this is NOT considered a comparison.  (Why?
Hint: look up the definition of the 'basic operation' for this problem.)

Since we divide the problem in half on each recursive call, the first
paragraph on Page 57 shows how to easily compute W(n), leading to Theorem 1.14.

Note that we could predict that the function would be logarithmic, by observing
that you can handle inputs of twice the size by only doing one more comparison.

Page 57, 1.6.3:
Note the definition of 'gap'.
           <--- gap 25
position 0
           <--- gap 26
position 1
           <--- gap 27
position 2
           <--- gap 28
position 3
           <--- gap 29
...

When the array size is just one short of a power of 2, then all the gaps
require the maximum number of comparisons.  Analyzing n=25 shows that for
other array sizes, sometimes you get 'lucky' and can decide that the element
is 'not found' while doing one fewer comparison than in the worst case.
Assumption 2 on Page 58 avoids this messy complication.

With n = 2^k - 1 for some whole number k, it's not hard to count the
number of comparisons.  Only the 'middle' array element can be found with
one comparison.  The array elements 1/4 and 3/4 of the way through the array can
be found with two comparisons, etc.  This leads to the first formula for A_1(n).
Since k=lg(n+1), (why????) we can write the second formula using only n,
and get rid of k altogether.  Since for these 'nice' values of k,
A_0(n) = lg(n+1), we can combine the results to get Theorem 1.15, showing that
A(n) = A_1(n)*q + A_0(n)*(1-q) = lg(n+1) - q.

Page 59, 1.6.4
The decision tree for Algorithm 1.4 when n=7 is:

    3
   / \
  1   5
 / \ / \
0  2 4  6

for n=8,9,...15, we need a fourth decision level, and starting with n=16,
we need a fifth level.  For n=7, we can now visually see that the worst
case requires three comparisons [to descend three levels].

The above tree, and the similar one in Figure 1.8 on Page 60, are both
poorly specified.  Instead of just '3', it would be useful to indicate
the decision being made here, e.g., 'K ? E[3]' (the sought-after value
is being compared to E[3]).  The result of this comparison is actually
a three-way branch, e.g.,

            'K?E[3]'
           /    |   \
        < /   = |    \ >
         /      |     \
        /       |      \
  'K?E[1]'   index=3   'K?E[5]'

...where the left and right branches are subsequent comparisons, but the
middle branch is an outcome (the algorithm returns the index 3 as the
location of K).

To discuss optimality, remember we have to analyze the *problem*, not a
particular algorithm.  A proposed algorithm does not have to first compare
the key to the 'middle' array element; we have to consider all possible
strategies.  The proof (Page 60) argues that every array element must appear
somewhere in the decision tree, or else the algorithm will give the wrong
answer for some data sets [inputs].  Once you have thus proved that the
[binary] decision tree must have n nodes in it, it is easy to then argue
that a 'balanced' tree will have the shortest depth, and therefore the
best 'worst case' behavior.  (Note that the longest path from the root to
a leaf corresponds to doing the most comparisons in the algorithm.)  In a
'balanced' tree with n nodes, the depth is the ceiling of the expression
lg(n+1), and hence this is a lower bound for the complexity.

Since the analysis of Algorithm 1.4 proves that this same expression is
an upper bound for the complexity of the problem, we have equal upper and
lower bounds, and hence we know that Algorithm 1.4 is an optimal solution
for the Ordered Array Searching problem.

In the proof near the top of Page 60, a plausibility argument is given to
relate the depth of the tree to the maximum number of nodes.
A more formal statement of this is Lemma 2.2 (on Page 81),
and a serious proof of this relationship requires induction.
At the end of the proof on Page 60, this relationship is rephrased
with logarithms.  This is similar to Lemma 2.3 (also on Page 81),
and illustrates how Lemma 2.3 can be easily proved, assuming one has
already proved Lemma 2.2.

To see if you understand decision trees, try this Question:
For Algorithm 1.1, what does the picture of the decision tree look like?

Note that in a mathematical proof, you should start with statements you 
know are true, and proceed logically to the desired conclusion.  However,
to discover how the proof should be structured, you often do the exact
opposite: start with the desired conclusion and work 'backward'.  However,
keep in mind that the proof itself should not be written 'backward'!