What Is a BSTR?
The BSTR type is actually a typedef, which in typical Windows include file fashion, is made up of more typedefs and defines. You can follow the twisted path yourself, but here's what it boils down to:
typedef wchar_t * BSTR;
Hmmm. A BSTR is actually a pointer to Unicode characters. Does that look familiar? In case you don't recognize this, let me point out a couple of similar typedefs:
typedef wchar_t * LPWSTR;
typedef char * LPSTR;
So if a BSTR is just a pointer to characters, how is it different from the null-terminated strings that C++ programmers know so well? Internally, the difference is that there's something extra at the start and end of the string. The string length is maintained in a long variable just before the start address being pointed to, and the string always has an extra null character after the last character of the string. This null isn't part of the string, and you may have additional nulls embedded in the string.
That's the technical difference. The philosophical difference is that the contents of BSTRs are sacred. You're not allowed to modify the characters except according to very strict rules that we'll get to in a minute. OLE provides functions for allocating, reallocating, and destroying BSTRs. If you own an allocated BSTR, you may modify its contents as long as you don't change its size. Because every BSTR is, among other things, a pointer to a null-terminated string, you may pass one to any string function that expects a read-only (const) C string. The rules are much tighter for passing BSTRs to functions that modify string buffers. Usually, you can only use functions that take a string buffer argument and a maximum length argument.
All the rules work on the honor system. A BSTR is a BSTR by convention. Real types can be designed to permit only legal operations. Later we'll define a C++ type called String that does its best to enforce the rules. The point is that BSTR servers are honor-bound to follow the rules so that BSTR clients can use strings without even knowing that there are rules.
The BSTR System Functions
My descriptions of the OLE BSTR functions are different from and, in my opinion, more complete than the descriptions in OLE documentation. I had to experiment to determine some behavior that was scantily documented, and I checked the include files to get the real definitions, so I am confident that my descriptions are valid and will work for you.
For consistency with the rest of the article, the syntax used for code in this section has been normalized to use Win32 types such as LPWSTR and LPCWSTR. The actual prototypes in OLEAUTO.H use const OLECHAR FAR * (ignoring the equivalent LPCOLESTR types). The original reasons for using OLECHAR pointers rather than LPCWSTRs don't matter for this article.
You need to read this section only if you want to fully understand how the String class (presented later) works. But you don't really need to understand BSTRs in order to use the String class.
BSTR SysAllocString(LPCWSTR wsz);
Given a null-terminated wide character string, allocates a new BSTR of the same length and copies the string to the BSTR. This function works for empty and null strings. If you pass in a null string, you get back a null string. You also get back a null string if there isn't enough memory to allocate the given string.
Example:
// Create BSTR containing "Text"
bs = SysAllocString(L"Text")
BSTR SysAllocStringLen(LPCWSTR wsz, unsigned len);
Given a null-terminated wide-character string and a maximum length, allocates a new BSTR of the given length and copies up to that length of characters from the string to the BSTR. If the length of the copied string is less than the given maximum length, a null character is written after the last copied character. The rest of the requested length is allocated, but not initialized (except that there will always be a null character at the end of the BSTR). Thus the string will be doubly null-terminated--once at the end of the copied characters and once at the end of the allocated space. If NULL is passed as the string, the whole length is allocated, but not initialized (except for the terminating null character). Don't count on allocated but uninitialized strings to contain null characters or anything else in particular. It's best to fill uninitialized strings as soon after allocation as possible.
Example:
// Create BSTR containing "Te"
bs = SysAllocStringLen(L"Text", 2)
// Create BSTR containing "Text" followed by \0 and a junk character
bs = SysAllocStringLen(L"Text", 6)
BSTR SysAllocStringByteLen(LPSTR sz, unsigned len);
Given a null-terminated ANSI string, allocates a new BSTR of the given length and copies up to that length of bytes from the string to the BSTR. The result is a BSTR with two ANSI characters crammed into each wide character. There is very little you could do with such a string, and therefore not much reason to use this function. It's there for string conversion operations such as Visual Basic's StrConv function. What you really want is a function that creates a BSTR from an ANSI string, but this isn't it. The function works like SysAllocStringLen if you pass a null pointer or a length greater than the length of the input string.
BOOL SysReAllocString(BSTR * pbs, LPWSTR wsz);
Allocates a new BSTR of the same length as the given wide-character string, copies the string to the BSTR, frees the BSTR pointed to by the first pointer, and resets the pointer to the new BSTR. Notice that the first parameter is a pointer to a BSTR, not a BSTR. Normally, you'll pass a BSTR pointer with the address-of operator.
Example:
// Reallocate BSTR bs as "NewText"
f = SysReAllocString(&bs, "NewText");
BOOL SysReAllocStringLen(BSTR * pbs, LPWSTR wsz, unsigned len);
Allocates a new BSTR of the given length, and copies as many characters as fit of the given wide-character string to the new BSTR. It then frees the BSTR pointed to by the first pointer and resets the pointer to the new BSTR. Often the new pointer will be the same as the old pointer, but you shouldn't count on this. You can give the same BSTR for both arguments if you want to truncate an existing BSTR. For example, you might allocate a BSTR buffer, call an API function to fill the buffer, and then reallocate the string to its actual length.
Example:
// Create uninitialized buffer of length MAX_BUF.
BSTR bsInput = SysAllocStringLen(NULL, MAX_BUF);
// Call API function to fill the buffer and return actual length.
cch = GetTempPathW(MAX_BUF, bsInput);
// Truncate string to actual length.
BOOL f = SysReAllocStringLen(&bsInput, bsInput, cch);
unsigned SysStringLen(BSTR bs);
Returns the length of the BSTR in characters. This length does not include the terminating null. This function will return zero as the length of either a null BSTR or an empty BSTR.
Example:
// Get character length of string.
cch = SysStringLen(bs);
unsigned SysStringByteLen(BSTR bs);
Returns the length of the BSTR in bytes, not including the terminating null. This information is rarely of any value. Note that if you look at the length prefix of a BSTR in a debugger, you'll see the byte length (as returned by this function) rather than the character length.
void SysFreeString(BSTR bs);
Frees the memory assigned to the given BSTR. The contents of the string may be completely freed by the operating system, or they may just sit there unchanged. Either way, they no longer belong to you and you had better not read or write to them. Don't confuse a deallocated BSTR with a null BSTR. The null BSTR is valid; the deallocated BSTR is not.
Example:
// Deallocate a string.
SysFreeString(bs);
BSTR SysAllocStringA(LPCSTR sz);
The same as SysAllocString, except that it takes an ANSI string argument. OLE doesn't provide this function; it's declared in BString.H and defined in BString.Cpp. Normally, you should only use this function to create Unicode BSTRs from ANSI character string variables or function return values. It works for ANSI string literals, but it's wasted effort because you could just declare Unicode literals and save yourself some run-time processing.
Example:
// Create BSTR containing "Text".
bs = SysAllocStringA(sz)
BSTR SysAllocStringLenA(LPCSTR sz, unsigned len);
The same as SysAllocStringLen, except that it takes an ANSI string argument. This is my enhancement function, declared in BString.H.
Example:
// Create BSTR containing six characters, some or all of them from sz.
bs = SysAllocStringLenA(sz, 6)
The Eight Rules of BSTR
Knowing what the BSTR functions do doesn't mean you know how to use them. Just as the BSTR type is more than its typedef implies, the BSTR functions require more knowledge than documentation states. Those who obey the rules live in peace and happiness. Those who violate them live in fear--plagued by the ghosts of bugs past and future.
The trouble is, these rules are passed on in the oral tradition; they are not carved in stone. You're just supposed to know. The following list is an educated attempt--based on scraps of ancient manuscripts, and revised through trial and error--to codify the oral tradition. Remember, it is just an attempt.
Rule 1: Allocate, destroy, and measure BSTRs only through the OLE API (the Sys functions).
Those who use their supposed knowledge of BSTR internals are doomed to an unknowable but horrible fate in future versions. (You have to follow the rules if you don't want bugs.)
Rule 2: You may have your way with all the characters of strings you own.
The last character you own is the last character reported by SysStringLen, not the last non-null character. You may fool functions that believe in null-terminated strings by inserting null characters in BSTRs, but don't fool yourself.
Rule 3: You may change the pointers to strings you own, but only by following the rules.
In other words, you can change those pointers with SysReAllocString or SysReAllocStringLen. The trick with this rule (and rule 2) is determining whether you own the strings.
Rule 4: You do not own any BSTR passed to you by value.
The only thing you can do with such a string is copy it or pass it on to other functions that won't modify it. The caller owns the string and will dispose of it according to its whims. A BSTR passed by value looks like this in C++:
void DLLAPI TakeThisStringAndCopyIt(BCSTR bsIn);
The BCSTR is a typedef that should have been defined by OLE, but wasn't. I define it like this in OleType.H:
typedef const wchar_t * const BCSTR;
If you declare input parameters for your functions this way, the C++ compiler will enforce the law by failing on most attempts to change either the contents or the pointer.
The Object Description Language (ODL) statement for the same function looks like this:
void WINAPI TakeThisStringAndCopyIt([in] BCSTR bsIn);
The BCSTR type is simply an alias for BSTR because MKTYPLIB doesn't recognize const. The [in] attribute allows MKTYPLIB to compile type information indicating the unchangeable nature of the BSTR. OLE clients such as Visual Basic will see this type information and assume you aren't going to change the string. If you violate this trust, the results are unpredictable.
Rule 5: You own any BSTR passed to you by reference as an in/out parameter.
You can modify the contents of the string, or you can replace the original pointer with a new one (using SysReAlloc functions). A BSTR passed by reference looks like this in C++:
void DLLAPI TakeThisStringAndGiveMeAnother(BSTR * pbsInOut);
Notice that the parameter doesn't use BCSTR because both the string and the pointer are modifiable. In itself the prototype doesn't turn a reference BSTR into an in/out BSTR. You do that with the following ODL statement:
void WINAPI TakeThisStringAndGiveMeAnother([in, out] BSTR * pbsInOut);
The [in, out] attribute tells MKTYPLIB to compile type information indicating that the string will have a valid value on input, but that you can modify that value and return something else if you want. For example, your function might do something like this:
// Copy input string.
bsNew = SysAllocString(*pbsInOut);
// Replace input with different output.
f = SysReAllocString(pbsInOut, L"Take me home");
// Use the copied string for something else.
UseString(bsNew);
Rule 6: You must create any BSTR passed to you by reference as an out string.
The string parameter you receive isn't really a string--it's a placeholder. The caller expects you to assign an allocated string to the unallocated pointer, and you'd better do it. Otherwise the caller will probably crash when it tries to perform string operations on the uninitialized pointer. The prototype for an out parameter looks the same as one for an in/out parameter, but the ODL statement is different:
void WINAPI TakeNothingAndGiveMeAString([out] BSTR * pbsOut);
The [out] attribute tells MKTYPLIB to compile type information indicating that the string has no valid input but expects valid output. A container such as Visual Basic will see this attribute and will free any string assigned to the passed variable before calling your function. After the return the container will assume the variable is valid. For example, you might do something like this:
// Allocate an output string.
*pbsOut = SysAllocString(L"As you like it");
Rule 7: You must create a BSTR in order to return it.
A string returned by a function is different from any other string. You can't just take a string parameter passed to you, modify the contents, and return it. If you did, you'd have two string variables referring to the same memory location, and unpleasant things would happen when different parts of the client code tried to modify them. So if you want to return a modified string, you allocate a copy, modify the copy, and return it. You prototype a returned BSTR like this:
BSTR DLLAPI TransformThisString(BCSTR bsIn);
The ODL version looks like this:
BSTR WINAPI TransformThisString([in] BSTR bsIn);
You might code it like this:
// Make a new copy.
BSTR bsRet = SysAllocString(bsIn);
// Transform copy (uppercase it).
_wcsupr(bsRet);
// Return copy.
return bsRet;
Rule 8: A null pointer is the same as an empty string to a BSTR.
Experienced C++ programmers will find this concept startling because it certainly isn't true of normal C++ strings. An empty BSTR is a pointer to a zero-length string. It has a single null character to the right of the address being pointed to, and a long integer containing zero to the left. A null BSTR is a null pointer pointing to nothing. There can't be any characters to the right of nothing, and there can't be any length to the left of nothing. Nevertheless, a null pointer is considered to have a length of zero (that's what SysStringLen returns).
When dealing with BSTRs, you may get unexpected results if you fail to take this into account. When you receive a string parameter, keep in mind that it may be a null pointer. For example, Visual Basic 4.0 makes all uninitialized strings null pointers. Many C++ run-time functions that handle empty strings without any problem fail rudely if you try to pass them a null pointer. You must protect any library function calls:
if (bsIn != NULL) {
wcsncat(bsRet, bsIn, SysStringLen(bsRet));
}
When you call Win32 API functions that expect a null pointer, make sure you're not accidentally passing an empty string:
cch = SearchPath(wcslen(bsPath) ? bsPath : (BSTR)NULL, bsBuffer,
wcslen(bsExt) ? bsExt : (BSTR)NULL, cchMax, bsRet, pBase);
When you return functions (either in return values or through out parameters), keep in mind that the caller will treat null pointers and empty strings the same. You can return whichever is most convenient. In other words, you have to clearly understand and distinguish between null pointers and empty strings in your C++ functions so that callers can ignore the difference in Basic.
In Visual Basic, a null pointer (represented by the constant vbNullString) is equivalent to an empty string. Therefore, the following statement prints True:
Debug.Print vbNullString = ""
If you need to compare two strings in a function designed to be called from Visual Basic, make sure you respect this equality.
Those are the rules. What is the penalty for breaking them? If you do something that's clearly wrong, you may just crash. But if you do something that violates the definition of a BSTR (or a VARIANT or SAFEARRAY, as we'll learn later) without causing an immediate failure, results vary.
When you're debugging under Windows NT (but not under Windows 95) you may hit a breakpoint in the system heap code if you fail to properly allocate or deallocate resources. You'll see a message box saying "User breakpoint called from code at 0xXXXXXXX" and you'll see an int 3 instruction pop up in the disassembly window with no clue as to where you are or what caused the error. If you continue running (or if you run the same code outside the debugger or under Windows 95), you may or may not encounter a fate too terrible to speak of. This is not my idea of a good debugging system. An exception or an error dialog box would be more helpful, but something is better than nothing, which is what you get under Windows 95.