Friday, November 4, 2011

Going Solr

Over the last couple of weeks, we whipped-up an enterprise search site using Apache Solr. It was a fantastic experience. Solr is impressive. My first proper exposure to full text search (FTS) technology was via its inclusion in a recent version of PostgreSQL (see earlier post). I quickly learned that incorporating FTS does wonders for user satisfaction. So upon discovering the large number of public websites that are Solr Powered, we had to take Solr for a spin ourselves. We were not disappointed.

Some interesting notes about our experience:
  • Solr is not a turn-key solution; it's a search engine for a solution you create.
  • Doing the tutorial was critical to understanding Solr.
  • XSLT worked great for rendering Solr's XML results.
  • Some helpful documentation sources we found:
    o Apache Solr (tutorial and presentations);
    o Lucid Imagination (lucidworks ref guide); and
    o w3schools.com (xslt tutorial).
  • Solr is lightning fast.
  • Solr is packed with enterprise features. However, like most open source tools, you must have an adventurous spirit to realize them.
  • Jetty makes deploying web / app servers simple. Launching the Solr example app was literally a one-liner. Incredible.
Tip: If you're implementing an enterprise search engine, consider using Apache Solr.

20 Jan 2012 Update - Posted the detailed 'how-to' for indexing office documents with Solr (source code and all). Enjoy.

Friday, October 14, 2011

Fastest Way to Learn



Here's the fastest way I've found to learn the core skills of an IT architect / specialist. Unfortunately and with much embarrassment, I didn't figure this out on my own. It took that huge project failure mentioned earlier, a great boss, and caring mentors to set me on the right path:
  1. Find an experienced team with a history of delivering projects into production, and do whatever it takes to get on that team.
  2. Stay with the project through delivery to production. Even better, stay with the project through the first service release.
  3. Be a major contributor. Shift rolls as necessary to maintain this status (i.e. requirements lead to design lead to testing lead).
  4. Keep your eyes and ears open. Roll up your sleeves and learn everything you can from the 'stars' on that team.
Repeat this as often as you can throughout your career to continue your growth and keep your skills current.

Lesson learned: To accelerate learning and keep your skills current, serve as a major contributor on an accomplished delivery team throughout a project's life-cycle. Repeat.

16 May 2012 Update: Be sure to read the Fastest Way to Learn Part 2.

Thursday, September 29, 2011

AQT Saves the Day


AQT has several time saving features like Export
Relational databases are pervasive in the enterprise. More, they come in all sorts of shapes and sizes in the same shop - from modest desktop DB's like Access / Excel to open source workhorses like PostgreSQL / MySQL to top-shelf like Oracle / MS-SQL / DB2 to old powerhouses like Informix.

Wouldn't it be nice if you could have just one tool to interact with all of these databases: construct queries, view data, import / export data, perform administration, compare / move data between databases, etc.? Advanced Query Tool (AQT) is that tool. Like the SQL Cookbook, AQT comes highly recommended by those same 'scary good' DB guys.

I've found AQT to be especially valuable when working with: (1) old beasts that don't have nice interactive query tools, and (2) newer beasts that require access to the DB's installation media and heavyweight installs. Our team here uses AQT everyday for developing, testing, and supporting the State's production systems.

Note: The inspiration for this post was from one of my colleagues. He used AQT to verify the recovery of image files into blob fields on an Informix database. You can imagine the command line wizardry this would require had he used the database's built-in tools.

5 Oct 2011 Update - AQT has come to our rescue again twice in the last couple of days. We used the data loader to recover about 14,000 records, and used the data compare to identify duplicate records between production and archive databases.

Wednesday, September 14, 2011

TestLink is a Great Test Case Management Tool

There are dozens and dozens of testing tools out there in the wild. For large custom application development projects, I've found that using a tool to store and manage test cases is very valuable. It's just too resource intensive to do this work by hand. In the old days, we'd roll-our-own solution by using the document management features of a tool like Notes or Exchange. We'd customize them so that we could create test suites, enter test results, add build numbers, assign testers, etc..

This brings us to today. Surely, someone has built a great open source tool for test case management. Thankfully, the answer is a resounding YES. Of the choices available, we selected TestLink. It's open source. It's simple. It's proven (used by test shops at Intel, Lucent, Philips, Samsung, Symantec, Toshiba, Yahoo, etc.).

After using TestLink during the system testing of Wyoming's new excise tax system (went live July 2011), I can confidently recommend TestLink. Management likes it, the users like it, and their crazy architect likes it.

Discovery: If you plan on performing formal testing for a large project, consider using TestLink.

Wednesday, September 7, 2011

Use Cases Example

Continuing the example from the previous post, here's a selection of use cases from that project (click here or on the pic to see the detailed example). These were written using a modified form of the fully-dressed use case template from Writing Effective Use Cases. Some notes regarding the example:

o Version History - Evidence that use cases are living documents that evolve with your project;
o Use Case List - The table form of the use case diagram (often a PM's best friend); and
o (5) Use Cases - Implemented use cases. Note their brevity. Note how the complexity of multiple flows are accounted for in their extensions. Note how the main flow focuses on the frequently traveled path (the happy path).

How to Read Extensions: The extension number indicates the step in the main success scenario for which a different path may be taken. They are usually used to describe significant options you must provide or special failure handling that you must address (rather than just give an error message and end).

Low tech search tags: Real world use case example, actual use case example, use case list example, example using fully-dressed use case template, use cases from a real project, use cases from a successful project, great use case example, detailed use case example, professional use case example, recommended use case example, use case model example, use case model from a successful project, use case work product.

Tuesday, August 30, 2011

Use Cases Evolve


R1 Planned


R1 Implemented + R2 Planned

R2 Implemented + R3 Planned

R3 Implemented + Rx Planned

The use case version history, use case list, and use case text for the above can be found here in a follow-up post.

Healthy systems evolve. With each release, you add the knowledge gained from the previous release and improve the next. So naturally, your use cases will reflect that. To illustrate this, take a look at the use case diagrams for a business registration system built for the Government of Samoa. Note how they progress from release to release. Note how the implemented use cases evolve from the planned.

This registration system continues to evolve. It has 21 releases so far, with 5 major releases (include several new/updated use cases). To insure the right business functionality was built, 'worked closely with the client business experts on the use cases as each major release progressed. New use cases were discovered, some existing ones updated, and others retired. This is normal for a healthy development cycle.

Note: The meat of the use cases is the text. To me, the diagrams are a visual form of the use case list: they do a nice job of listing the use cases, their actors, and functional groupings. I'm not a big fan of / nor recommend using the fancy UML include/extends/generalization constructs; it's just too easy to venture into design and forget to concentrate on the use case text.

Friday, August 26, 2011

SQL Cookbook to the Rescue

This week was rough. We were auditing complex financial records in one of our new production databases. To do so, we had to wield some advanced SQL. I had to study-up on joins and a few other advanced techniques. How did I do it? I had a great coach: DB2 SQL Cookbook by Graeme Birchall. I found an old original PDF (here), a modern HTML fork (here), and more via Google (here).

On a previous project, some 'scary good' data mining experts and DBA's recommended this book. I've been using it ever since. Those guys were so right; this is an excellent resource for the practitioner. More, the content applies regardless of the database you use. Just adjust the syntax slightly for your target database engine and you'll be slaying queries like a pro in no time. Now here's the kicker: Graeme’s book is free to download… so no excuses, go get that data!

Favorite recipe: Multiple Counts in One Pass

13 Aug 2015 - Updated links (old ones were decommissioned).
31 Jan 2022 - Updated links again (same reason).

Monday, August 22, 2011

Black-Belt in Use Case Development

Years ago, I was an architect on a top ten world-wide IT project failure. One of my many mistakes was failing to properly collect requirements. Never again. After a lot of encouragement from a great boss and caring peers, I made a commitment to learn the best practices from accomplished veterans. First best practice on-deck: use cases. With some great teams, I mastered use case writing, led requirements efforts, and now have several successful systems in production today to show for it.

Mastering use case development is perhaps the best move I ever made on becoming a better architect.

The secret: Writing Effective Use Cases (preview) by Alistair Cockburn.

Alistair is the Use Case master. His book is probably the most valuable IT book on my bookshelf. If you've ever struggled through gathering requirements, you'll appreciate its value immediately. You wont believe how many times you jump up and exclaim, "That happened to me! That is so true!" If you're a beginner, just know that this gem contains the best practices from years of actual project experience. When you follow it, you follow in the footsteps of real architects that get projects done.

As beginners, on the first project where we used this book, the business consultants and I did nightly reading assignments as we composed the use cases. Within a month or so, we had a complete set of use cases that were fantastic, provided immediate value, and proved to be a key success factor throughout the life of the project. So with a little homework, your team can do it too.

Lesson learned: Commit yourself to mastering the practice of use case development, and DO IT.

Monday, August 8, 2011

Utilize Full Text Search

About a year ago, I released a new version of a business registration system that had one major new feature: Full Text Search (FTS). In a long series of releases, it was the most successful feature I ever added. It achieved a huge increase in effective system use and user satisfaction - it was a major win all the way around.

The system already had an outstanding search bar that could search on any combination of fields and leverage the database's powerful regular expression engine. I put a lot of work into this fancy search bar, but in practice, its power went largely unused. Frustrating!

Months later, while upgrading the infrastructure of the system, I noticed that PostgreSQL added full text search as part of its standard release. Hmmm... After digging a bit, 'discovered how it tokenizes words so that you can get better search results. For example, searching for 'industry' would match 'industry, industries, industrial' while ignoring case and punctuation. Cool.

I thought, "Hey, maybe I should add this type of search to the search bar." After a week or two of work, wow. The search results were far better (degrees of magnitude better). They were ranked. The matches were highlighted. Heck, it was like I added Google search to the system.

Here's all I had to do:
  • Choose source fields in tables to tokenize
  • Create matching tsvector fields in those same tables
    (a tsvector is a list of tokenized words)
  • Create triggers to update the tsvector fields anytime their source fields are updated
  • Adjust the searches to match against these tsvector fields
  • Tune the database's FTS engine by customizing its dictionaries to the nature of the system's data
    (i.e. additional entries to better match company names: inc = incorporated = incorporation)
Note that the terminology varies by storage engine, but the technique remains largely the same.

Lesson learned: If you're implementing a search function, utilize FTS.

4 Nov 2011 Update: See more recent post regarding using Apache Solr for enterprise search.