Development Cycles
  Home arrow Development Cycles arrow Page 4 - Thoughts on the Craft of Programming: Abst...
Dev Articles Forums 
ADO.NET  
Apache  
ASP  
ASP.NET  
C#  
C++  
ColdFusion  
COM/COM+  
Delphi-Kylix  
Design Usability  
Development Cycles  
DHTML  
Embedded Tools  
Flash  
Graphic Design  
HTML  
IIS  
Interviews  
Java  
JavaScript  
MySQL  
Oracle  
Photoshop  
PHP  
Reviews  
Ruby-on-Rails  
SQL  
SQL Server  
Style Sheets  
VB.Net  
Visual Basic  
Web Authoring  
Web Services  
Web Standards  
XML  
Mobile Linux 
App Generation ROI 
IBM® developerWorks 
Sun Developer Network 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
DEVELOPMENT CYCLES

Thoughts on the Craft of Programming: Abstraction, Refactoring, and How Changes Introduce Bugs
By: The Rational Edge
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 4
    2003-08-13

    Table of Contents:
  • Thoughts on the Craft of Programming: Abstraction, Refactoring, and How Changes Introduce Bugs
  • Bugs
  • Cleaning Code
  • Vowels
  • An Aside on Programming by Contract
  • Lesson

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Thoughts on the Craft of Programming: Abstraction, Refactoring, and How Changes Introduce Bugs - Vowels


    (Page 4 of 6 )

    Refactor to Minimize the Scope of the Definition of "Vowel"

    That code looks reasonably good, but it does combine two functions that are really quite separate: scanning the string and deciding whether a character is a vowel. Once again, our artistic antennae are alerted. Scanning a string is an instance of the more general function of traversing a data collection, so with the Visitor pattern at the back of our minds we decide to separate the two functions.

    This also makes sense because the definition of "vowel" may vary between languages (and indeed may not exist at all in some), so this is a part of the program that looks as though it might depend on the target environment and hence one that should be isolated into a small function. With this change, the total amount of text has increased a little, but the conceptual complexity has been reduced because each function can now be viewed in isolation.

    Although I will not demonstrate it here, the code shown in Figure 6 is ready for further generalization, because any is_x function with the same signature as is_vowel could be passed as an argument to the string scanning function, and so all sorts of different tests could be realized by writing only additional code (i.e., not changing existing code).

    In C++ or Java, instead of passing a function address (impossible in Java) one would wrap each function in its own class, declaring is_x as a method and pass as the argument an object of the appropriate class (the functor idiom), and in C# one could use a delegate, but the principle is exactly the same.

    Now that it is a separate function, is_vowel invites a more efficient implementation -- for example, one based on a 256-byte array of character properties in the manner of the isalpha function and its companions5. This would be worth doing if profiling revealed that this function was an execution bottleneck.

    The function is_vowel is a candidate for an independent unit test. Indeed, since there are only 256 possible values of c6, this is an example of that rare case: a piece of code that can be exhaustively tested. So we should take advantage of this and write and execute such a test in this instance; this would be particularly worthwhile if we planned to improve the implementation as suggested in the previous paragraph. We'll return to this point a little later.

     

    Figure 6: Refactor to Minimize the Scope of the Definition of "Vowel" Code

    A New Feature: Counting the Number of Vowel Pairs

    The new feature is to count the number of vowel pairs in the string, or more precisely the number of instances where s[i] and s[i+1] are both vowels; for example, the vowel pair count for "beautiful" is 2, for "ea" and "au"7. Figure 7 shows the new function, count_vowelpairs, and its supporting function, is_vowelpair:

    Figure 7: Counting Vowel Pairs

    That was easy -- a little copy and paste, a minor edit, a one-line function, and we're done. Note how, because of the previous refactoring, we were able to use the is_vowelpair function to implement is_vowelpair. Note also that if in the original implementation of is_vowel we had passed a character pointer instead of a character, we could have implemented is_vowelpair as an is_x function and used the function pointer protocol mentioned previously to implement the new feature without having to implement count_vowelpairs as a separate new function. But we were not sufficiently farsighted, and the decision to pass a character was a wise one, based on the principle that it is safer not to pass a pointer if it can be avoided.

    But are we really done? No, we have been a little cavalier in our approach, and the program contains an error, one that might well evade testing8.

    On hearing that, a programmer might first be drawn to the is_vowelpair function because it dereferences (p+1). Is that always a valid address? Yes, in fact it is, because the test in the for statement guarantees that p always points to a character in the string, so even when p is pointing to the last character in the string, (p+1) is pointing at the null character terminating the string and so is still a valid address. We were lucky when we made the change, because we had not explicitly considered this. There is no way in C to protect is_vowelpair from being invoked with a pointer p such that (p+1) is an invalid address; we resolve to ensure that the specification for is_vowelpair contains a prominent notice of this precondition.

    So what is the bug? It is this: if the string ends in a vowel x, the character pair "x\0" will be incorrectly identified as a vowel pair. The groundwork for this bug was laid in the improvement from Version 3 to Version 4; it turns out that the if test and the strchr test are not exactly equivalent. The function strchr considers the terminating null character to be part of the string for the purposes of the search, so searching for the null character will always succeed and return a non-zero pointer. This is documented in the specification for strchr but is not well known; it is not mentioned in the MSDN Library documentation for the Microsoft Foundation Classes method CString::Find (which itself calls strchr). Yet it is on such minutiż that the correct execution of a program may hinge.

    Let's return briefly to the point about unit testing in the previous section. The long-term value of such testing is now more clearly apparent, and if we had doubts earlier about whether the investment of time to write and execute the test would be repaid, those doubts are now probably resolved. However, tests must be written with as much care as code, and in this particular case we must be sure to test the case when c is zero.

    How Should we Fix this Bug?

    One way would be to replace the body of is_vowelpair with the statement:

    return is_vowel(*p) && *(p+1) != '\0' && is_vowel(*(p+1));

    This will fix the bug, and in one sense it is quite correct, because it simply compensates for the slightly awkward behavior of strchr in our context. Another interpretation might be that the error lies in is_vowel, which classifies the null character as a vowel, and that this function should be changed to correct the error (I hear Bertrand Meyer -- see footnote 9 -- in the background chiding us for an insufficiently precisely defined precondition for is_vowel)9. I maintain, however, that either of these changes treats the symptom rather than the disease. Our original suspicion of (p+1) was justified; I contend that the proper precondition for is_vowelpair is that both p and (p+1) point to actual characters within the string, not to the terminator. Therefore a correct fix for the bug is to leave is_vowelpair as it is and replace the for statement in count_vowelpairs with this:

    for (; *s != '\0' && *(s+1) != '\0'; s++)

    It is necessary to test both characters because testing only *(s+1) will fail if the string is null (for then *(s+1) would reference an invalid address). This is a bit clumsy, and we might want to consider testing for a null string before entering the loop or using a different termination condition, perhaps one based directly on the length of the string. We might also want to consider weakening the precondition for is_vowel by making it classify the null character as not a vowel. As the saying goes, these things are left as exercises for the reader.

    More Development Cycles Articles
    More By The Rational Edge


     

    DEVELOPMENT CYCLES ARTICLES

    - Branch and Bound Algorithm Technique
    - Dynamic Programming Algorithm Technique
    - Genetic Algorithm Techniques
    - Greedy Strategy as an Algorithm Technique
    - Divide and Conquer Algorithm Technique
    - The Backtracking Algorithm Technique
    - More Pattern Matching Algorithms: B-M
    - Pattern Matching Algorithms Demystified: KMP
    - Coding Standards
    - A Peek into the Future: Transactional Memory
    - Learning About the Graph Construct using Gam...
    - Learning About the Graph Construct using Gam...
    - Learning About the Graph Construct using Gam...
    - How to Strike a Match
    - Entity Relationship Modeling






    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 6 hosted by Hostway
    Stay green...Green IT