(1) Adding a cache system in Pincho library. Probably the simplest cache

Usb Mass Storage devices usually use the FAT32 filesystem, which it kind of surprising due to how old FAT32 is but it lasts because FAT32 is Win, OSX and Linux compatible and not very difficult to implement in embedded devices.

FAT32 solved some limitations from previous FAT versions as limited file names and file sizes but still uses, as it names implies, a file allocation table to find the cluster that define a file.

FAT32 uses a linked list to allocate clusters for a given file as shown here. Clusters entries doesn’t need to be adjacent and one entry can point to a previous entry.

That simplicity comes with a cost, a Θ(n) cost to be exact, because you must traverse the FAT to find a empty cluster. For this reason FAT32 and other linked-list filesystems does not scale very well.

I was aware of this while developing Pincho but it shocked me when I started the first tests. While read operations performs pretty decent, write operations using a naive approach are painfully slow and it gets worse when the USB mass storage device gets crowded with files.

A extremely best case scenario of an allocation of three files that require three clusters.Although this portion of the FAT looks empty from the last inserted file that can’t be supposed in a real scenario because fragmentation. Allocation is this scenario will be Θ(3n), note that allocation is linear, as previously shown but a constant appears, that is equal to the number of clusters the file will use. In a fragmented device, allocation will be even slower

The way to fight with this issue is through caching. Caching is a world itself, with a lot of algorithms and approximations. An aggressive caching will use a lot of memory to store the state of most of the FAT to minimize the I/O operations but Android heap is limited, also consistency must be guaranteed between data in the cache and data in the device and that adds some complexity.

Pincho library is still in kind of experimental state so I would like to try different caching approaches. The first I implemented is probably the simplest, not by far the best, but it gets some improvements. It is simple and it has a small memory footprint.

This implementation assumes correctly that what we want to minimize the number of I/O operations. Storage devices are usually accessed in blocks, called sectors, of 512 bytes addressed by a 32-bit direction called Logic block addressing (LBA). Sectors that compose the FAT can store 128 cluster entries. Those sectors with enough free entries (What is enough must be decided, If sectors with just a few free entries are cached it may not be a good speed improvement) are stored in an Array, that guarantees that an I/O operation on the FAT will return at least some of the free clusters needed to store a file, avoiding traversing the filled parts of the FAT.
I have defined three levels, the low cache just cache enough space to store 100 mb of data, the medium cache try to cache good sectors of half of the cache and the high cache good sectors of the whole FAT. The main difference between these modes is the time needed to perform the cache.

It is an improvement but there is a lot of room for more improvements. For the next releases I have planned to add more sophisticated caching systems.

Happy coding!


Pincho: A USB Mass storage Android implementation without ROOT v0.1

Source code
Download the AAR file

Somedays ago I released a very simple app to transfer files between your phone and a USB Flash drive. Kind of a proof of concept but the most interesting is not the app itself, it is the library that powers it. I had very positive experiences releasing another library that handle USB to Serial chipsets without needing a rooted phone so I am excited to release this new open-source(MIT License) USB mass storage library for Android as Pincho (silly spanish name we use to name the USB sticks).

Pincho’s main objective is not only to handle the upper layer of the USB Mass storage stack (FAT32 at this moment). Also exposes the SCSI commands layer and even the Bulk-Only transport protocol.
USB connected devices must implement, besides the obvious Mass Storage class, the 0x06 subclass (SCSI protocol) and the 0x50 protocol (Bulk-Only).
Currently Pincho only supports FAT32 filesystem and a maximum of 4 partitions (Extended partitions still not supported).

Add Pincho to your Android project
The easiest way to use it right now is copying the .aar file to the libs folder and adding these lines in your build.gradle file.

dependencies {
// Other dependencies...
compile project(':usbmassstorageforandroid-release')

Virtual FileSystem methods
Although Pincho currently only implements FAT32, there is a defined class to abstract it. Adding new filesystems wouldn’t have to affect drastically previous written code. Here are the basic methods

UsbDevice mDevice;
UsbDeviceConnection mConnection;

// choose the device and initialize mConnection through UsbManager.openDevice(UsbDevice mDevice);
// http://developer.android.com/guide/topics/connectivity/usb/host.html

* Constants
public static final int CACHE_NONE = 0; // No cache
public static final int CACHE_LOW = 1; // Cache for a 100 Mbytes allocation
public static final int CACHE_MEDIUM = 2; // Cache half of the FAT
public static final int CACHE_HIGH = 3; // Cache the whole FAT

* Constructor definition
public VirtualFileSystem(UsbDevice mDevice, UsbDeviceConnection mConnection);

* Mount operation, return true if device was successfully mounted. BLOCKING OPERATION
public boolean mount(int index);

* Mount operation, return true if device was successfully mounted. BLOCKING OPERATION*
* cacheMode: 0 no cache, 1 low cache, 2 medium cache, 3 high cache.
public boolean mount(int index, int cacheMode)

* List only file names, return a List of Strings. NON BLOCKING OPERATION
public List<String> list();

* List complete information of files, return a List of VFSFile objects. NON BLOCKING OPERATION
public List<VFSFile> listFiles();

* Get current path, return a string linux-like formatted with the current path. NON BLOCKING OPERATION
public String getPath();

* Change dir specified by a name, folder must be inside the current folder, return true if operation was * ok. BLOCKING OPERATION.
public boolean changeDir(String dirName);

* Change dir specified by a VFSFile, folder must be inside the current folder, return true if operation was ok. BLOCKING OPERATION.
public boolean changeDir(VFSFile file);

* Change dir back, return true if operation was ok. BLOCKING OPERATION
public boolean changeDirBack();

* Write file, return true if file was written correctly. BLOCKING OPERATION
public boolean writeFile(File file); //java.io.File 

* Read file specified by a string, return an array of bytes. BLOCKING OPERATION
public byte[] readFile(String fileName);

* Read file specified by a VFSFile, return an array of bytes. BLOCKING OPERATION
public byte[] readFile(VFSFile file);

* Delete file specified by a string, return true if file was deleted. BLOCKING OPERATION
public boolean deleteFile(String fileName);

* UnMount the USB mass storage device, return true if was unmounted correctly. BLOCKING OPERATION
public boolean unMount();

File data is encapsulated in VFSFile class

public class VFSFile
    private String fileName;
    private boolean isReadOnly;
    private boolean isHidden;
    private boolean isSystem;
    private boolean isVolume;
    private boolean isDirectory;
    private boolean isArchive;
    private Date creationDate;
    private Date lastAccessedDate;
    private Date lastModifiedDate;
    private long size;

SCSI Interface
Pincho also allows you to call directly a number of SCSI calls which gives a lot of raw power. SCSI calls are used through a class SCSICommunicator that implements a buffer to queue every call so they are non-blocking. Current implemented calls are:

– Inquiry
– ReadCapacity10
– Read10
– RequestSense
– TestUnitReady
– Write10
– ModeSense10
– ModeSelect10
– FormatUnit
– PreventAllowRemoval

In order to receive any notification about a current SCSI operation a callback must be defined

private SCSIInterface scsiInterface = new SCSIInterface()
    public void onSCSIOperationCompleted(int status, int dataResidue)
      // status 0: completed successfully
      // status 1: Some error occurred

    public void onSCSIDataReceived(SCSIResponse response)
       // Data received in the data-phase.
       // Possible responses: SCSIInquiryResponse, SCSIModeSense10Response, SCSIRead10Response,
       //     SCSIReadCapacity10, SCSIReportLuns, SCSIRequestSense

    public void onSCSIOperationStarted(boolean status)
      // SCSI operation started

Create the SCSICommunicator object

SCSICommunicator comm;
comm = new SCSICommunicator(mDevice, mConnection);

And call whatever SCSI call you need. SCSI calls are particularly tricky as they have a lot of parameters and some of them are kind of obscure. Although I have in mind writing about them eventually Here is a good source of information about SCSI calls.

public void read10(int rdProtect, boolean dpo, boolean fua,
                   boolean fuaNv, int logicalBlockAddress,
                   int groupNumber, int transferLength)

public void requestSense(boolean desc, int allocationLength)

public void testUnitReady()

public void write10(int wrProtect, boolean dpo, boolean fua,
                    boolean fuaNv, int logicalBlockAddress, int groupNumber,
                    int transferLength, byte[] data)

public void modeSense10(boolean llbaa, boolean dbd, int pc,
                        int pageCode, int subPageCode, int allocationLength)

public void modeSelect10(boolean pageFormat, boolean savePages, int parameterListLength)

public void formatUnit(boolean fmtpinfo, boolean rtoReq, boolean longList,
                       boolean fmtData, boolean cmplst, int defectListFormat)

public void preventAllowRemoval(int lun, boolean prevent)

There is a lot of room for improvements. I am totally open to hear your thoughts about this. Some of the pending enhancements are:

– Implement a Caching system. Reading is not particularly slow as Pincho knows what LBA has to read to find the next cluster node in the FAT but writing is slower than it should be.
– Not every SCSI call is implemented.
– Support extended partitions.
– More asynchronous interface in VirtualFileSystem to do not have to worry about blocking operations. Defining a callback to receive notifications and data.
– Other Filesystems could be eventually added.

Update 08-19-15: A primitive cache added

There are probably more important things that I am missing so if you find them please let me know.

Happy coding 🙂