06/18/2015

Importing tens of thousand of files in M-Files

Overview

This is the first part of blog series where I talk about a recent project working with M-Files. I will publish the next parts on my blog as well.

Prelude

In a recent project, I was helping a client importing tens of thousand of files into M-Files. M-Files is an information management solution for documents and other information.

M-Files in a nutshell

M-Files allows you to create, edit and search for objects, which are instances of classes with certain properties. Usually an administrator manages the classes and users manage instances of those classes.

A class instance may have files attached, but they can also consist of only metadata, depending on the properties of it's class.

M-Files runs on Windows, comes with a server component and client components (eg. an explorer extension), a web interface, APIs and documentation.

The Requirement

As the schema was already finished, that is, all classes were already defined, the client needed a way to import tens of thousands of files from their NAS into M-Files and attach the correct metadata. As the number of files was really big, manually importing and tagging the files was no option, so that's why they contacted me.

Setting up a development environment

So first things first, I started by setting up a new test environment, where I could develop the solution without effecting the production system. As you can download a fully functional trial version of all components from their website, the only thing I was missing was getting the schemas and the files from the client.

Early on we decided, that the best option to tag the files were based on their full file path, as the customer used a naming scheme for their files and folder structure. This was fortunate, as I could skip a content analysis of the files, their full path would hold all relevant information. This meant, that I didn't need a full copy of all files - just the filepath was sufficient.

This solved two problems at once:

  • Getting full access to sensitive client data
  • Copying terabytes of data or manipulating the production environment
As the NAS was running on a linux box, I wrote a short script that dumped all the names of the files that they wanted to import into a text file.

I then wrote a commandline tool that used that list to recreate the files on the virtual machine, with the difference being, that each file has no content, it's size is 0 bytes.

Now the development environment was setup - even the files that I wanted to import would look like they did to a client in the production environment.

So setting up my development environment looked like this:

  • Download M-Files
  • Spin up a new windows-based virtual machine
  • Install M-Files on the virtual machine
  • Create a backup of the customers vault
  • Restore the backup of the vault on my virtual machine
  • Create a list of files that the client wanted to import
  • Recreate those files on development environment

Getting to work

With the setup out of the way, I could now get to work on the solution.

M-Files supports a few APIs, including:

  • Native API
  • Web API
  • UI Extensibility Framework

In this case, the natural choice was the more mature native API, which is ActiveX powered, as there is no requirement for a GUI integration and the superior speed advantage alone disqualifies the Web API. The M-Files native API comes with a .NET wrapper that makes interacting with M-Files both fast and takes away the pain of interoperating with COM objects, thereby reducing the cost of iteration. The native api is well documented - the documentation is included in the installation, in case you didn't find it online.

Stay tuned for Part II

In the next part I will post about how I used the M-Files API and how the solution I came up with looked like...

If you have any questions, please feel free to leave a comment here and thank you for reading!

Take care,
Martin

Resources

Last updated 06/19/2015 01:59:03
blog comments powered by Disqus
Questions?
Ask Martin