Problems with the .NET Uri Implementation

This document presents an analysis of two major problems that I discovered while using the .NET System.Uri class.

What is a URI?

URIs are defined by RFC2396 as:

A Uniform Resource Identifier (URI) is a compact string of characters for identifying an abstract or physical resource.

URIs are more general forms of URLs and URNs. In other words URLs are one type of URI. There

Uri Equality

The first problem concerns checking the equality of two Uris. This is the kind of thing you might do if you're writing a search engine where you want to find out whether you've visited a Uri before. Comparing the string versions of the objects is a bad idea since there may be case sensitive parts in the Uri. For example, in a HTTP URL the host name is case insensitive (after all it's a domain name), whereas the path is case sensitive. The two parts need to be compared separately.

In any case the code I was trying to write compared two Uri objects that differed only in one parameter in the query string. I kept getting weird results telling me that my Uris were equal when I knew that they weren't. I ended up writing the following code to test the problem:

Uri uri1 = new Uri("http://www.example.com/test.cgi?q=1234");
Uri uri2 = new Uri("http://www.example.com/test.cgi?q=5678");

if (uri1.Equals(uri2)) {
	Console.WriteLine("uri1 == uri2");
}
else {
	Console.WriteLine("uri1 != uri2");
}

To my amazement this still insisted that the two Uris were identical. I tested two Uri objects that differed in the path as well:

Uri uri1 = new Uri("http://www.example.com/test.cgi?q=1234");
Uri uri2 = new Uri("http://www.example.com/test1.cgi?q=5678");

if (uri1.Equals(uri2)) {
Console.WriteLine("uri1 == uri2");
}
else {
Console.WriteLine("uri1 != uri2");
}

This gave the expected result: the two Uris were different.

I added code to print out the two query strings for the first test:

Console.WriteLine("uri1.Query=" + uri1.Query);
Console.WriteLine("uri2.Query=" + uri2.Query);

Running this gave the expected result, uri1.Query=?q=1234 and uri2.Query=?q=5678

I checked the documentation for the Equals method:

The Equals method compares two URI instances without regard to any fragments that they might contain. For instance, given the URIs http://www.contoso.com/index.htm#search and http://www.contoso.com/index.htm the Equals method would return true

It talks about fragments but there's no mention of the query string. Now, according to the RFC equivilence of Uris is scheme dependant. There is no defined method for comparison of two arbitrary Uris. Also, query strings are only found in Uris that fall into the class of "Generic Uris" (section 3 of the RFC). A Generic Uri has the following structure:

scheme://authority/path?query

If the .NET Uri class was intended to be a strict Uri representation then it should not provide a comparison method based on the content of the Uri but base it on the object identifier. However that class would be fairly uninteresting and Microsoft have taken the obvious route of adding some additional fuctionality such as parsing the Uri into Authority, Path, Query and Fragment parts. It goes further and parses the Authority into Host, Port and UserInfo.

So somewhere in the implementation of the Uri class someone has forgotten to include the query part of a parsed Uri when writing the Equals method.

Fragment Parsing

I discovered the second problem while searching the web for a solution to the first problem. In this posting to one of the .NET newsgroups Keith Reimer asks:

I've noticed that if a Uri has both a query and a fragment, then the Uri.Fragment is "" and the Uri.Query contains both the query and the fragment.

The sole reply was:

This is by design, you have either query or fragement

Having worked in depth with Uris for a number of years I couldn't believe what I was reading. This guy was claiming that the Uri class was designed not to parse valid Uris including the Uri of his posting on Google Groups. A quick test confirmed the behaviour:

uri.ToString()                                uri.Fragment   uri.Query
http://www.example.com/test.cgi?#1234         1234
http://www.example.com/test.cgi?q=1234                       ?q=1234
http://www.example.com/test.cgi#hello?q=1234  #hello?q=1234
http://www.example.com/test.cgi?q=1234#hello                 ?q=1234%23hello

The last example is clearly wrong. The fragment is not a part of the Uri and should be parsed out first. Once that is done then the query can be composed from the ? to the end of the Uri. Again, from the RFC:

The term "URI-reference" is used here to denote the common usage of a resource identifier. A URI reference may be absolute or relative, and may have additional information attached in the form of a fragment identifier. However, "the URI" that results from such a reference includes only the absolute URI after the fragment identifier (if any) is removed and after any relative URI is resolved to its absolute form.

URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]

Conclusion

Both problems are serious enough individually to warrant any developer of Internet technologies thinking twice about using the class. However, when the problems are combined it surely points to a serious deficiency in the implementation and testing of the class. These are fundamental problems with a class that is central to Microsoft's implementation of Web Services, XML parsing and ASP.NET.

My recommendation is to avoid using this class until Microsoft issues a service pack for it. That means don't use the .NET framework for building Web Services, parsing any kind of XML or constructing ASP based web sites. While you're waiting for Microsoft to fix these bugs it might be worth your while pondering the quality of implementation of the other several thousand classes.

Permalink: http://blog.iandavis.com/2002/04/problems-with-the-net-uri-implementation/

Other posts tagged as programming

Earlier Posts