Use Bing API and MSDN metadata to generate code automagically (Part 1)

Working more and more with Garrett Serack on the CoApp project, I found myself needing to query MSDN in a non-traditional manner -- programmatically. To be more precise, I needed access to the function prototypes of various APIs for code generation purposes. For the past few months, I tried a number of search facilities, web services, and even experimented with symbol server lookups. In the end, however, I settled on an unlikely solution – Bing.

To be more specific, I’m currently working on the CoApp-Trace tool which, powered by Microsoft Detours, assists in the documentation of existing open-source build processes. To do this, I have to – at a high level – intercept calls to original Microsoft Windows functions and redirect them to my own similar functions. Let’s go through an example of how I’d implement this in my project:

  1. Scope out the function I’d like to monitor, in this case NtCreateFile.
  2. Manually obtain the function prototype (NTSTATUS NtCreateFile( PHANDLE, ACCESS_MASK, … ))
  3. Cut/paste the prototype into my DLL and modify to meet my requirements

While not hard, it’s extremely time consuming when your list of APIs reaches about 10 or more. I know this first hand, because I re-implemented over 200 APIs in my previous Windows Vista API on XP (VAIOXP) project. It’s a tedious, error-prone, and debilitating process.

To eliminate steps 1 and 2, and decrease the amount of tedium in step 3, I took advantage of MSDN’s rich (and little known about) metadata. A lot like Flickr, almost every page on MSDN is tagged and categorized with words and phrases. While not so useful for humans, it’s extremely useful to computers and, unlike all the other search engines available today, is indexed! Yes, …

Bing is the only major search provider left that indexes meta content.

So what’s this metadata we’re talking about? Taking NtCreateFile as an example again, let’s look it up on MSDN. It’s pretty standard stuff – you get a short description of what the function does, a function prototype, and a listing of all the parameters and behaviors. But let’s look at it again, from a computer’s perspective. Holy shnikes!

Meta

Just from scanning through a small snippet of metadata (included above), you can immediately see this NtCreateFile page has some of the following attributes:

  • API reference
  • Specific to NtCreateFile API
  • Specific to API that lives in ntdll.dll
  • Targets the Windows OS

Let’s run some queries now… wait, how do I search for just the meta information?

This is where all my hard work pays off for you. Over the course of 143 days (4 months 20 days), I’ve been harassing the Bing team for answers, indirectly through Scott Hanselman, and let me tell you – getting answers from these folks is as hard as nailing Jell-O to the wall.

Using the normal Bing search page, the undocumented syntax is: meta:name(“value”)

Meta is simply a keyword. Name should be replaced with a meta tag’s name. For example, we want to find pages specific to the NtCreateFile function so we can use Search.MSHAttr.APIName. Lastly, value should be replaced with… the value you’d like to search for in the content part of the meta tag. In this case, we’ll set it to NtCreateFile.

Here’s what the completed query looks like: meta:Search.MSHAttr.APIName(“NtCreateFile”). Paste that into Bing and you should receive at least one hit, with the first hit being the exact page we’re interested in. Sweet huh?

Well, don’t get too excited. When trying to retrieve a more complex set of results, Bing makes a boo boo. For example, let’s try to find all the (documented) functions in ntdll.dll. So using the construct above, we should be able to use meta:Search.MSHAttr.APILocation(“ntdll.dll”) right? Executing that query results in… wait. Only 2 hits? I think we all know, ntdll.dll exports more than just two functions.

In discussions with Bing, again indirectly, it was brought to light that a host collapsing issue exists. (Judging by how it was brought up, it’s not high on the list of things to fix.) Host collapsing is just fancy talk for the combining of search results that all have the same parent page in common – think de-duplication. In our case, Bing is incorrectly merging all our expected results. To disable host collapsing when using Bing.com, you can do one of two things:

  1. Constrain your search to the MSDN site via the site: keyword (e.g. site:msdn.microsoft.com, site:msdn2.microsoft.com).
  2. Use the undocumented hc=0 URL parameter (e.g. put &hc=0) at the end of your URL.

Repeating our last search, with the hc=0 parameter in place, you’ll see a more useful set of results. In the second part of our adventure, we’ll combine our newfound meta searching knowledge with the Bing Search web service and C# to create a simple code-generation extension for Visual Studio 2010.