java NIO - DirectBuffer 和 HeapBuffer

问题 :

  1. DirectBuffer 属于堆外存,那应该还是属于用户内存,而不是内核内存?
  2. FileChannel 的read(ByteBuffer dst)函数,write(ByteBuffer src)函数中,如果传入的参数是HeapBuffer类型,则会临时申请一块DirectBuffer,进行数据拷贝,而不是直接进行数据传输,这是出于什么原因?

DirectBuffer

Java               |      native
                   |
 DirectByteBuffer  |     malloc'd
 [    address   ] -+-> [   data    ]
                   |

 DirectByteBuffer 自身是一个Java对象,在Java堆中;而这个对象中有个long类型字段address,记录着一块调用 malloc() 申请到的native memory。DirectByteBuffer 自身是(Java)堆内的,它背后真正承载数据的buffer是在(Java)堆外——native memory中的。这是 malloc() 分配出来的内存,是用户态的。(来自参考文章R大的回答)

DirectBuffer 和 HeapBuffer

 两个都是Buffer ,不同的是前者使用的是堆外内存,后者时候的是 JVM 堆内内存。在使用 FileChannel 读写的时候内部实现就有点不同了。以下是FileChannel使用代码

    public static void main(String[] args) throws Exception{
        RandomAccessFile aFile = new RandomAccessFile("data/nio-data.txt", "rw");
        FileChannel channel = aFile.getChannel();
        String newData = "New String to write to file..." + System.currentTimeMillis();

        // HeapByteBuffer
        ByteBuffer buf = ByteBuffer.allocate(48);
        // DirectByteBuffer
        ByteBuffer dirctBuf = ByteBuffer.allocateDirect(48);

        buf.clear();
        buf.put(newData.getBytes());

        buf.flip();

        while(buf.hasRemaining()) {
            channel.write(buf);
        }
    }
        //读取地址
        FileInputStream fis = new FileInputStream("C:\\CloudMusic\\Circadian Eyes - Ferris Wheel.mp3");
        //写出地址
        FileOutputStream fos = new FileOutputStream("D:\\etc\\cas\\logs\\cas_audit.log");
        FileChannel fc = fis.getChannel();
        MappedByteBuffer mbb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
        //回刷回磁盘
        mbb.flip();
        fos.flush();
        fc.close();
        fis.close();

 如果上面的代码channel.write传入的参数是HeapBuffer类型,则会临时申请一块DirectBuffer,将HeapBuffer中的数据进行数据拷贝到堆外内存,然后剩下就是对DirectBuffer进行IO操作,为什么直接使用HeapBuffer拷贝数据到内核中,然后进行IO操作呢?这是因为如果要把一个Java里的 byte[] 对象的引用传给native代码,让native代码直接访问数组的内容的话,就必须要保证native代码在访问的时候这个 byte[] 对象不能被移动,也就是要被“pin”(钉)住。而虚拟机的GC 算法会移动对象,导致地址会变化,那么后续就会产生错误。详细的见参考资料R大的回答。  OpenJDK的 sun.nio.ch.IOUtil.write(FileDescriptor fd, ByteBuffer src, long position, NativeDispatcher nd) 的实现。

static int write(FileDescriptor fd, ByteBuffer src, long position,
                     NativeDispatcher nd)
        throws IOException
    {
        if (src instanceof DirectBuffer)
            return writeFromNativeBuffer(fd, src, position, nd);

        // Substitute a native buffer
        int pos = src.position();
        int lim = src.limit();
        assert (pos <= lim);
        int rem = (pos <= lim ? lim - pos : 0);
        ByteBuffer bb = Util.getTemporaryDirectBuffer(rem);
        try {
            bb.put(src);
            bb.flip();
            // Do not update src until we see how many bytes were written
            src.position(pos);

            int n = writeFromNativeBuffer(fd, bb, position, nd);
            if (n > 0) {
                // now update src
                src.position(pos + n);
            }
            return n;
        } finally {
            Util.offerFirstTemporaryDirectBuffer(bb);
        }
    }

MappedByteBuffer

 MappedByteBuffer 是 DirectBuffer 的父类,它的读写性能比HeapByteBuffer要高(不然FileChannel 内部实现中也不会用DirectByteBuffer进行操作)。MappedByteBuffer 内部原理主要和操作系统的虚拟存储有关,更加直接的联系就是页表相关的知识,先阅读以下这篇文章

补充

 关于 Heap memory 和 Native memory的解释,来自stackoverflow

  1. Heap memory: memory within the JVM process that is managed by the JVM to represent Java objects

  2. Native memory/Off-heap: is memory allocated within the processes address space that is not within the heap.

  3. Direct memory: is similar to native, but also implies that an underlying buffer within the hardware is being shared. For example buffer within the network adapter or graphics display. The goal here is to reduce the number of times the same bytes is being copied about in memory.

Finally, depending upon the OS then extra native allocations (assigning of the memory address space) can be carried out via Unsafe alloc and/or by memory mapping a file. Memory mapping a file is especially interesting as it can easily allocate more memory than the machine currently has as physical ram. Also note, that the total address space limit is restricted by the size of a pointer being used, a 32bit pointer cannot go outside of 4GB. Period.

参考资料

  • https://www.zhihu.com/question/57374068 (推荐一看)