This file contains the in-class lecture notes related to Chapter 1 of Baase. Some of the class information contained in this file, including directions for getting email announcements and the essential syllabus info, are on-line at: http://edoras.sdsu.edu/~carroll/cs460home.html Day One: There is a list circulating in the room for you to sign up on; MAKE SURE your name is on it before you leave the room. (All prospective students, not just registered students, should sign this list.) For CS460, the prerequisites are: CS210 (Data Structures, formerly CS310) and Math245 (Discrete Mathematics). I may also unregister students who have not attended class during the first week, to make room for crashers. I will try to accommodate most students, giving priority to the 'most desperate' (e.g., if you can prove you can finish your Bachelor's degree this semester). If there is a long waitlist, it will be a while before I can give definitive answers on who will be accepted. In that case, I'll probably have you turn in a short assignment after the first week of classes, and those who successfully complete it may be deemed to have met the prerequisites and be allowed to enroll [even if you do not officially have Math245 and CS210 on your transcript]. I don't expect to use the Canvas website very much for this class. In particular, if you email me using the Canvas class site, you almost certainly won't get as prompt an answer as mailing me directly at carroll@edoras.sdsu.edu . [Not-yet-registered students should email me at carroll@edoras.sdsu.edu as well, so that I know that you exist. That will help me ensure you get all the announcements, help, and hints that the enrolled students are getting.] The syllabus [and exams] are the only paper documents I will distribute this semester. Just in case you lose the current piece of paper, the electronic form of the syllabus can be found under http://edoras.sdsu.edu/~carroll/ (follow the link there for CS460) The web page will be updated each semester, but the info for Fall 2024 is: Text: Computer Algorithms: Introduction to Design and Analysis, by Sara Baase and Allen Van Gelder ISBN: 0201612445 Notes: A collection of the annotated programs and diagrams presented in class. Available at Cal Copy (5187 College Ave, 619-582-9949) Course Content: Algorithm analysis, induction, recursion and recurrence relations, sorting and selection, graph algorithms, introduction to NP-completeness and approximation algorithms. Prerequisites: CS310 (Data Structures) and Math245 (Discrete Mathematics). You must know the material in these courses, and the courses they depend on (e.g., Calculus), or you will be lost. Grading: Most assignments will involve mathematical analysis, and hence be turned in much as you would in any math class. (It does NOT have to be typeset, but it must be clear enough for me to follow.) Assignments will comprise 1/4 of your grade, the final will be worth 1/2, and the midterm will account for 1/4 of your grade. You can find out in advance when all your finals are, including this one. For Fall 2024, for example, the final schedule for SDSU is listed at: https://registrar.sdsu.edu/calendars/finals/fall-2024 ...so just look up the date/time for this class (and your other classes). Letter grades: 90% and above is guaranteed at least an A- 80% and above is guaranteed at least a B- 70% and above is guaranteed at least a C- 60% and above is guaranteed at least a D- Policies: Homework and programming assignments are intended to help you learn. Talking over your ideas and problems with other people in the class is very helpful. You may discuss ideas, but you must do your own work and write up your own solutions and programs. In particular, you may NOT work on a program or assignment as a team. Using another person's work is cheating. Copying a program from a book is plagiarism, just like copying from a paper for a humanities class, unless you give an appropriate citation. If you are in doubt about where the border line is for our assignments, ASK ME. It should go without saying (but past experience suggests saying it) that copying on exams, homework, or other forms of cheating will be dealt with severely. Automatic failure of the course is guaranteed in such cases, and sanctions can include expulsion from the university. If an assignment is copied (or developed as a team), BOTH parties will fail the course (so, if someone asks to copy your work, point them at this paragraph :-) Your assignments are due at the beginning of class on the day specified on the assignment. To maintain fairness and uniformity of grading, I cannot accept late assignments. Similarly, there will be no make-up exams except in unusual circumstances (to be determined by me). If you know in advance that you will miss an exam, see me about it in advance. Note the date of our final exam now; don't make plans that conflict with the final. Note in particular that the university policy described in https://registrar.sdsu.edu/calendars/finals/fall-2024 prohibits taking the final early. Though we will be deeply immersed in computer science topics, this course will feel more like a math course than a computer science course. In particular, there will be NO programming assignments. Most of your homework (and exam problems) will involve inventing rigorous proofs, and working through the details of algorithms 'by hand' to fully understand how they work. [The algorithms we will consider are all well-known, and a web search will quickly turn up usable code if you need to implement them in the future; thus, we will concentrate on understanding, not coding.] Reading: Chapter 1 (skip section 1.2, for now at least) We will skip Chapter 2, so Chapter 3 will be next. You undoubtedly have at least a vague idea of what the term 'algorithm' means; roughly, it's a specification of step-by-step instructions that (given some input, like a scrambled list of numbers) leads to an answer to the proposed question (such as "What does the sorted version of these numbers look like?"). We are rather picky: we want a *correct* answer, no matter which input we feed the algorithm. (Note that this implies that the carried-out instructions can't go on forever; for any input, the algorithm HAS to return an answer.) If the 'algorithm' sometimes gets irretrievably stuck in a loop, or sometimes returns a wrong answer, then we cannot call it an algorithm. One surprising result (from almost a century ago) is that there are some problems for which no algorithm can exist. One such [quite reasonable] question that is impossible to reliably answer is: "Given an arbitrary computer program, will this program halt on all inputs?" It's not that no one has been smart enough to write a true algorithm for this question; if you assume the existence of such an algorithm, it leads to logical contradictions -- so no one is trying to solve this, since such an algorithm CANNOT exist. So there are limits on what can be done algorithmically. But there are plenty of things we *do* know how to do well, and that's what the first part of this course is about. (After which, we'll spend about a month considering 'hard' problems for which we have solutions, but not good solutions. Section 1.3, Page 12: This section covers things that you should already know; they should have been topics in your Discrete Mathematics class. Make sure you understand these basic concepts. Here are some of the things to which you should pay particular attention, since they will be used in the chapters we will cover this semester: Combinations (Page 13) are related to permutations (Page 16). Make sure you understand the logic behind the formulas. Tuples and cross-products should be familiar to you. Page 14: note that Definition 1.2 only makes sense for AxB if A=B (=S) Section 1.3.2, Page 15: We will use all this stuff -- note the lg notation at the bottom of the page. Page 16: Probability -- you should have seen this in Discrete Math, and perhaps in a probability or statistics course as well. Page 18: The "In general..." paragraph gives a good sense of conditional probability. Definition 1.5, Stochastic Independence Consider drawing a card from a 52-card deck. The probability of drawing an Ace is 4/52 (since there are four aces in the deck, and all cards have an equal chance [1 out of 52] of being picked). Now consider drawing a second card and laying it beside the first one. If there are 51 cards left, then each card has a 1 in 51 chance of being picked. Drawing an Ace this second time is no longer an independent event, since the probability will depend on what the first card drawn was. (If you picked something other than an Ace the first time, then the next probability is 4/51; if the first card was an Ace, then there are now only 3 other aces left, so the probability is 3/51. Therefore, the probability for this second draw is NOT independent of the first event -- it matters what the first card was.) By contrast, if our 'experiment' consists of drawing a card, then replacing it in the deck, shuffling, and again drawing one of the 52 shuffled cards, these ARE a independent events -- the probability of getting an Ace this second time is 4/52, regardless of which card was picked the first time. Page 19: Definition 1.6 Expected value This is sometimes called 'average value', though note that it is a *weighted* average. Here's an example, based on rolling a pair of (six-sided) dice: The probabilities of rolling each sum are: 2: 1/36 5: 4/36 8: 5/36 11: 2/36 3: 2/36 6: 5/36 9: 4/36 12: 1/36 4: 3/36 7: 6/36 10: 3/36 The expected value of the sum is then 2*1/36 + 3*2/36 + 4*3/36 + 5*4/36 + 6*5/36 + 7*6/36 + 8*5/36 + 9*4/36 + 10*3/36 + 11*2/36 + 12*1/36 = (2+6+12+20+30+42+40+36+30+22+12)36 = 252/36 = 7 Lemma1.2: note the 'one or the other' strategy Page 24: Note that Figure 1.3a is monotonic but not convex, and 1.3b is convex but not monotonic. Page 26: Figure 1.4 shows a neat trick we will need to use, as on more than one occasion we will have to add up things like log(1)+log(2)+log(3)+...+log(n), which is at best tedious for a particular n, and impossible to get an exact expression for in terms of an arbitrary n. But as the figures show, the answer will be approximately the integral of the smooth curve, and it's easy to integrate functions like log(x). In particular, we will need the result of Example 1.8 on page 27. Section 1.3.3, Page 28: the concepts and equivalences in this section should be familiar from your Discrete Mathematics course; review them carefully, as we will use them over and over this semester. Announcements: Page xiii of the Preface shows the web page addresses associated with the text. I recommend going to https://users.soe.ucsc.edu/~avg/Supplements/web-errata.pdf ...and downloading this errata file, and making the indicated corrections to the text. Depending on what printing you happen to have, you may find that some of the corrections are already incorporated into your textbook. Section 1.4, Page 30: New stuff (finally!) Algorithm analysis: 5 criteria are given on Page 30. For the moment, we will concentrate principally on the amount of work done. The five criteria are: Correctness: an algorithm is of little use if it sometimes gives wrong answers! (Exception: approximation algorithms) preconditions/postconditions -- see definition on Page 30 example: binary search (phone book search only works because it is sorted). Amount Of Work Done: We need a good way to 'measure' this, and it's hard to say this precisely in a definitive way. We want a measure that is independent of the hardware, programming language, programming style, and 'bookkeeping operations', so just counting instructions or measuring computer time is a bad solution. (And it makes the analysis almost impossibly hard.) We instead concentrate on a 'fundamental operation', and count only those operations when analyzing or comparing algorithms. The concept is best understood by looking at several examples. Page 32: different problems have different fundamental operations For matrix multiplication, scalar multiplication is the obvious basic operation For a more complex analysis of matrix multiplication, one might count all additions, subtractions, multiplications, and divisions, instead of just multiplications. (And sometimes this matters -- one of your homework problems shows that in some special circumstances, a clever algorithm can 'trade' a multiplication for a few additions. In the homework problem, you might try tricks like taking (a+b)*c, and then use addition and subtraction to combine these partial results. (a+b)*c is equivalent to a*c + b*c, but the former involves one fewer multiplication.) Page 34: To analyze the worst-case complexity of a problem, we need to compute the number of basic operations we have to do in the worst case. To do this, we have to discover the most uncooperative input to feed to our algorithm. Naturally, the answer depends on n, the input size (the bigger the problem, the longer it is likely to take to find the answer). This will give a lower bound on the complexity, and allow us to make 'guarantees' about how long it will take to solve any problem. (A handy thing to know if you have to compute, for example, the trajectory of a missile headed toward you.) Page 35: Often, average-case complexity is a more useful measure (for example, if you're searching an airline database, you probably care more about average performance than worst-case performance). This is generally more difficult, since you have to consider ALL possible inputs, rather than focusing on one particular 'worst' input. Page 35: The text applies these two definitions [worst-case and average-case] to a simple problem: searching an unsorted array. Page 39, 1.4.6: Optimality Optimality measures the inherent complexity of a *problem* -- so far we've only been concerned with the specific complexity of a particular *algorithm*. For a given class of algorithms that solve a problem, we want to know how many 'basic operations' are actually *needed* to solve it. -- Proving that your algorithm is better than anyone else's algorithm does NOT mean that it is optimal -- there may be some as-yet-undiscovered algorithm that may be better. Note that they are talking about the fewest 'basic operations' IN THE WORST CASE; it is much harder to discuss optimality in terms of the average case! Page 39, 1.4.6: Lower/Upper Bounds 1) If a given algorithm has been proved correct, then its complexity W(n) gives a [possibly poor] upper bound for the complexity of the corresponding problem. 2) If you can prove a theorem that says that a given problem requires that every [correct] algorithm perform at least F(n) 'basic operations' [for some function F], then you have a lower bound for the complexity of the corresponding problem. If W(n) = F(n), then the algorithm in (1) is optimal; otherwise, we need to be smart enough to be able to prove a better theorem, or smart enough to devise a better algorithm before we can say with certainty what the complexity of the corresponding problem actually is. The book has two examples in this section; one proof of optimality, and one case where we're not smart enough to fully analyze the problem. Page 41: Example 1.12, Matrix Multiplication: To multiply two nxn square matrices, the 'standard' algorithm needs n^3 [scalar] multiplications. Thus, n^3 is an upper bound for this problem, but it has been shown to be a rather poor upper bound. There is an algorithm that does n^2.376 multiplications, and that is the best known upper bound at present. The best known lower bound is a theorem that states that matrix multiplication needs at least n^2 multiplications. So, at present we don't know if there is an undiscovered algorithm that can do the job using only n^2 multiplications, or whether the n^2.376 algorithm will turn out to be the best. (Perhaps neither is the case; the optimal algorithm might turn out to be somewhere in the middle, with complexity between n^2.376 and n^2.) Page 40-41: Example 1.11 Finding the largest array element requires n-1 comparisons. READ this example! I will instead do a similar problem, based on our previous example, Example1.9/Algorithm1.1 (Page 35) Algorithm1.1 turns out to be optimal; we have already shown that this *algorithm* requires n comparisons [in the worst case], and we can prove that the *problem* also needs n comparisons [in the worst case]. Hence the upper and lower bounds are identical, so Algorithm1.1 is optimal. To prove this, we have to show that any algorithm that solves this problem needs at least n 'basic operations' [comparisons] to get the correct answer. (Some inputs might require fewer comparisons, but remember that we are considering worst-case behavior, so just looking at a restricted set of inputs doesn't prove anything.) The analysis of Algorithm1.1 gives an upper bound on the complexity; note that when looking for a lower bound, it is inappropriate to be looking at features of Algorithm1.1; we must analyze the problem, not any specific algorithm. Most algorithms will probably compare the searched-for element with elements in the array, but we can't assume that is the case, since we must consider ALL possible algorithms for this problem (some algorithms might find some clever way to at times compare two array elements to each other, for example). To show that n is a lower bound for this problem, we will argue that any 'algorithm' that does fewer than n comparisons will [at least occasionally] give the wrong answer. Since we don't know anything about how the algorithm works, this can be hard to prove; but luckily, we only have to find ONE input that gives the wrong answer. (Because unless an 'algorithm' gives the correct answer for every possible input, it's not really an algorithm.) The argument proceeds in a similar fashion to the one in the book for Algorithm 1.3 -- for our new problem, each comparison eliminates at most one array element as a potential match for the searched-for element. We can argue that if less than n comparisons are done, then we can construct a data set [that is, an input] for which the algorithm fails. Perhaps the simplest way to prove this is to begin with a data set which does not contain the searched-for element. Any 'optimal' algorithm that always does fewer than n comparisons will therefore halt after doing n-1 [or fewer!] comparisons. If the algorithm is indeed correct, it will report that the searched-for element is not in the array (or else the algorithm gives an incorrect answer, in which case it is definitely NOT an algorithm that solves this problem!). The next part of the proof is to show that we can tweak this data set so that the proposed algorithm still proceeds in the same way, and reaches the same conclusion (that is, that the element is 'not found') -- and do it in such a way that this is now the wrong answer for the tweaked data set. (If the algorithm always compares an array element to the searched-for element, this is easy to prove; one of the array elements will not have been compared to anything, and we can change the value of this element to match the searched-for value, and then the proposed algorithm will incorrectly still conclude that the element is 'not found' -- only with this new data set, that will be the wrong conclusion. If the algorithm instead sometimes compares one array element to another array element, it is a bit trickier to argue that you can tweak the data set in such a way that it still proceeds along the same decision path, and comes to the wrong conclusion, but it can be done.) For a slightly different class of inputs, Algorithm1.1 might NOT be optimal. If the value we are searching for is GUARANTEED to be one of the elements in the array, then we do not need n comparisons; if we have done n-1 comparisons without finding a match, then an intelligent algorithm can deduce that the correct index is the [only] slot that has yet to be examined, and thus avoid the nth comparison. Similarly, if the class of inputs is restricted to arrays that are already sorted, then Algorithm1.1 is definitely not optimal. (Algorithm 1.4 on Pages 55 and 56 is much more efficient.) Page 42, 1.4.8: Implementation details... ...are often needed to correctly analyze the algorithm [both time and space]. In the set example on Page 42, note that the array solution is good for searching, bad for unioning; the linked list solution is good for unioning, but bad for searching. So, it's hard to say which data structure might be better for sets, since it depends on what kind of operation you need to perform on the set. Page 43: taking advantage of features of the hardware. The VAX, for example, had an assembly language instruction that could evaluate an entire polynomial, all in one instruction. Normally, we would count multiplications [or multiplications and additions] when evaluating an algorithm that had to deal with polynomial evaluation, but perhaps for such VAX programs, the basic operation should instead be the single assembly language instruction. Page 43, 1.5 Asymptotic growth rates The 'city' discussion on Page 44 is well worth reading. We are interested in characterizing growth rates based on what happens to large values (in our context, for large values of n, where n is the size of the input fed to the algorithm under consideration). Page 45: Def1.14, Lemma1.5 We don't care about small values of n, and we don't even care about constant factors. That is, f(x)=2x and g(x)=100x are both linear functions, and hence of the same 'order'. h(x)=x^2 and g(x)=100x are NOT of the same order -- h(x) grows faster (even though g(x) has much larger values than h(x) does until x gets very large). The important feature here is that if you double the size of x, g(x) becomes twice as large. However, when you double the size of x, h(x) becomes four times as large, and hence h grows faster than g, and is therefore considered to be a different 'order'. Page 45: Figure 1.5 Page 49: 1.5.2 How much larger can a problem become before it takes four times as much effort to solve? Let's start with n=100, and examine some functions. For a linear function, (such as the ones for Algorithm 1.1 or 1.3, SequentialSearch or FindMax), we can go from n=100 to n=400 to make it four times as hard. For quadratic [n^2] functions (such as the one associated with some sorting algorithms), we can only double the size of the problem before it becomes four times as hard (that is, we go from n=100 to n=200). For exponential problems [2^n], we can only go from size n=100 to size n=102 -- each time you add 1 to n, you double the amount of work! (And n=98 was 4 times easier than n=100, etc.) As you might imagine, exponential problems are quite intractable -- you quickly reach the point where your hardware is overwhelmed by even small increases in the problem size. We will study such intractable problems in Chapter 13. Note that when it comes to asymptotic growth, n^1000000 grows SLOWER than 2^n does. Exponential growth beats polynomial growth every time. Page 52, 1.5.4, Theorem 1.13: We will find occasion to use these formulas at various points in the text. Page 53, 1.6: Searching an Ordered Array Note last paragraph, Page 53. 1.6.1: The exposition on Pages 54 and 55 is excellent; important points are made with each refinement, so read this section carefully. Algorithm 1.1 can 'quit early' when the key is matched to the array entry, and that is why the 'average-case complexity' was smaller than just 'n'. For the first improvement to the algorithm, we find a way to 'quit early' even in some cases when the key is not in the array, which leads to improvements to the formulas we developed on Pages 36 and 37. Note that we really do need the elements in the array to be sorted in order for this new method to always yield the correct answer. The modified algorithm improves the average-case complexity, but does not help with the worst-case complexity. The next improvement [top of Page 55] makes an improvement to both the average-case and the worst-case. Paragraph 2 on Page 55 argues that in the worst-case, the cost is now on the order of the square root of n, which is significantly better than the linear cost we had in previous algorithms. Further improvements lead to the Binary Search Algorithm (Algorithm 1.4, Page 55/56), which affords our first look at the 'divide-and-conquer' strategy. In this case, we break up our original problem into two smaller problems, and then reapply the strategy to the smaller problems, and continue in this manner, dividing up the problem until we find a problem so small that the answer is immediately obvious. Page 56: 1.6.2 The text mentions 'three-way branch', which means a single comparison which returns one of three results: =, <, or >. Hence, we really don't do a comparison on line 5 and then another one on line 7 -- a single comparison is sufficient. Hence, we do one [three-way] comparison for each recursive call [except for the last call]. Besides these comparisons on lines 5 and 7, there is another line of Java code [line 1] that involves '<'. As mentioned in the first paragraph of 1.6.2, this is NOT considered a comparison. (Why? Hint: look up the definition of the 'basic operation' for this problem.) Since we divide the problem in half on each recursive call, the first paragraph on Page 57 shows how to easily compute W(n), leading to Theorem 1.14. Note that we could predict that the function would be logarithmic, by observing that you can handle inputs of twice the size by only doing one more comparison. Page 57, 1.6.3: Note the definition of 'gap'. <--- gap 25 position 0 <--- gap 26 position 1 <--- gap 27 position 2 <--- gap 28 position 3 <--- gap 29 ... When the array size is just one short of a power of 2, then all the gaps require the maximum number of comparisons. Analyzing n=25 shows that for other array sizes, sometimes you get 'lucky' and can decide that the element is 'not found' while doing one fewer comparison than in the worst case. Assumption 2 on Page 58 avoids this messy complication. With n = 2^k - 1 for some whole number k, it's not hard to count the number of comparisons. Only the 'middle' array element can be found with one comparison. The array elements 1/4 and 3/4 of the way through the array can be found with two comparisons, etc. This leads to the first formula for A_1(n). Since k=lg(n+1), (why????) we can write the second formula using only n, and get rid of k altogether. Since for these 'nice' values of k, A_0(n) = lg(n+1), we can combine the results to get Theorem 1.15, showing that A(n) = A_1(n)*q + A_0(n)*(1-q) = lg(n+1) - q. Page 59, 1.6.4 The decision tree for Algorithm 1.4 when n=7 is: 3 / \ 1 5 / \ / \ 0 2 4 6 for n=8,9,...15, we need a fourth decision level, and starting with n=16, we need a fifth level. For n=7, we can now visually see that the worst case requires three comparisons [to descend three levels]. The above tree, and the similar one in Figure 1.8 on Page 60, are both poorly specified. Instead of just '3', it would be useful to indicate the decision being made here, e.g., 'K ? E[3]' (the sought-after value is being compared to E[3]). The result of this comparison is actually a three-way branch, e.g., 'K?E[3]' / | \ < / = | \ > / | \ / | \ 'K?E[1]' index=3 'K?E[5]' ...where the left and right branches are subsequent comparisons, but the middle branch is an outcome (the algorithm returns the index 3 as the location of K). To discuss optimality, remember we have to analyze the *problem*, not a particular algorithm. A proposed algorithm does not have to first compare the key to the 'middle' array element; we have to consider all possible strategies. The proof (Page 60) argues that every array element must appear somewhere in the decision tree, or else the algorithm will give the wrong answer for some data sets [inputs]. Once you have thus proved that the [binary] decision tree must have n nodes in it, it is easy to then argue that a 'balanced' tree will have the shortest depth, and therefore the best 'worst case' behavior. (Note that the longest path from the root to a leaf corresponds to doing the most comparisons in the algorithm.) In a 'balanced' tree with n nodes, the depth is the ceiling of the expression lg(n+1), and hence this is a lower bound for the complexity. Since the analysis of Algorithm 1.4 proves that this same expression is an upper bound for the complexity of the problem, we have equal upper and lower bounds, and hence we know that Algorithm 1.4 is an optimal solution for the Ordered Array Searching problem. In the proof near the top of Page 60, a plausibility argument is given to relate the depth of the tree to the maximum number of nodes. A more formal statement of this is Lemma 2.2 (on Page 81), and a serious proof of this relationship requires induction. At the end of the proof on Page 60, this relationship is rephrased with logarithms. This is similar to Lemma 2.3 (also on Page 81), and illustrates how Lemma 2.3 can be easily proved, assuming one has already proved Lemma 2.2. To see if you understand decision trees, try this Question: For Algorithm 1.1, what does the picture of the decision tree look like? Note that in a mathematical proof, you should start with statements you know are true, and proceed logically to the desired conclusion. However, to discover how the proof should be structured, you often do the exact opposite: start with the desired conclusion and work 'backward'. However, keep in mind that the proof itself should not be written 'backward'!