阅读视图

发现新文章,点击刷新页面。

iOS客户端开发基础知识——写文件避“坑”指南(二)

更多精彩文章,欢迎关注作者微信公众号:码工笔记

一、背景 & 问题

上一篇文章讲过,在iOS、macOS平台上,要保证新写入的文件内容成功落盘,需要调用fcntl(fd, FULL_SYNC)(注:开源chromium里也是这么做的[1]):

FULL_SYNC

Does the same thing as fsync(2) then asks the drive to flush all buffered data to the permanent storage device (arg is ignored). As this drains the entire queue of the device and acts as a barrier, data that had been fsync'd on the same device before is guaranteed to be persisted when this call returns. This is currently implemented on HFS, MS-DOS (FAT), Universal Disk Format (UDF) and APFS file systems. The operation may take quite a while to complete. Certain FireWire drives have also been known to ignore the request to flush their buffered data.

从上面man page的描述可以看出,FULL_SYNC是将设备unified buffer里的数据全部强制落盘,因为buffer中的数据可能不只包含刚刚写入的,可能还包含了之前写入的数据,虽然达到了持久化的目的,但时间不可控,可能会耗时很长,严重影响应用性能。

有没有什么优化方式呢?

二、F_BARRIERFSYNC

从应用开发者的角度,很多场景下并不需要这么强的落盘保证,大多数场景下,如果能保证写入顺序,也即先写入数据A,后写入数据B,如果后续读数据时读到了数据B,则A也一定存在,应用侧就可以自己做数据完整性检查了,从而可以做兜底逻辑。这样一来既能减少强制落盘对性能的影响,又能保证数据的完整性。

fcntlF_BARRIERFSYNC这个选项就是为了解决这个问题的。先看一下man page说明:

F_BARRIERFSYNC

Does the same thing as fsync(2) then issues a barrier command to the drive (arg is ignored). The barrier applies to I/O that have been flushed with fsync(2) on the same device before. These operations are guaranteed to be persisted before any other I/O that would follow the barrier, although no assumption should be made on what has been persisted or not when this call returns. After the barrier has been issued, operations on other FDs that have been fsync'd before can still be re-ordered by the device, but not after the barrier. This is typically useful to guarantee valid state on disk when ordering is a concern but durability is not. A barrier can be used to order two phases of operations on a set of file descriptors and ensure that no file can possibly get persisted with the effect of the second phase without the effect of the first one. To do so, execute operations of phase one, then fsync(2) each FD and issue a single barrier. Finally execute operations of phase two. This is currently implemented on HFS and APFS. It requires hardware support, which Apple SSDs are guaranteed to provide.

调用此方法后,系统虽不能保证数据是否真正落盘成功,但能保证写入的顺序,也即如果后写入的数据成功落盘,则先写入的数据一定已经落盘。

注:Apple的SSD都支持。

Apple的官方建议[2]是:如果有强落盘需求,可以用FULL_SYNC,但这会导致性能下降及设备损耗,如果只需要保证写入顺序,则建议用F_BARRIERFSYNC。

Minimize explicit storage synchronization

Writing data on iOS adds the data to a unified buffer cache that the system then writes to file storage. Forcing iOS to flush pending filesystem changes from the unified buffer can result in unnecessary writes to the disk, degrading performance and increasing wear on the device. When possible, avoid calling fsync(_:), or using the fcntl(_:_:) F_FULLFSYNC operation to force a flush.

Some apps require a write barrier to ensure data persistence before subsequent operations can proceed. Most apps can use the fcntl(_:_:) F_BARRIERFSYNC for this.

Only use F_FULLFSYNC when your app requires a strong expectation of data persistence. Note that F_FULLFSYNC represents a best-effort guarantee that iOS writes data to the disk, but data can still be lost in the case of sudden power loss.

三、例子:SQLite主线的问题和苹果的优化

SQLite是移动端最常用的文件数据库,读写文件是其功能的基石。SQLite是如何实现落盘的呢?看看SQLite仓库主线逻辑[3]:

#elif HAVE_FULLFSYNC
  if( fullSync ){
    rc = osFcntl(fd, F_FULLFSYNC, 0);
  }else{
    rc = 1;
  }
  /* If the FULLFSYNC failed, fall back to attempting an fsync().
  ** It shouldn't be possible for fullfsync to fail on the local
  ** file system (on OSX), so failure indicates that FULLFSYNC
  ** isn't supported for this file system. So, attempt an fsync
  ** and (for now) ignore the overhead of a superfluous fcntl call.
  ** It'd be better to detect fullfsync support once and avoid
  ** the fcntl call every time sync is called.
  */
  if( rc ) rc = fsync(fd);

#elif defined(__APPLE__)
  /* fdatasync() on HFS+ doesn't yet flush the file size if it changed correctly
  ** so currently we default to the macro that redefines fdatasync to fsync
  */
  rc = fsync(fd);

如果开了PRAGMA fullsync = ON,也是使用了F_FULLSYNC来保证写入成功。没开的话是使用fsync,这里应该是有问题的。

那iOS的libsqlite是怎么做的呢?这个库苹果没有开源,只能逆向看一下,搜搜相关的几个方法,应该在这一段汇编这里:

                                    loc_1b0d62f40:
00000001b0d62f40 682240F9               ldr        x8, [x19, #0x40]             ; CODE XREF=sub_1b0d62d34+444
00000001b0d62f44 880000B4               cbz        x8, loc_1b0d62f54

00000001b0d62f48 080140F9               ldr        x8, [x8]
00000001b0d62f4c 08A940B9               ldr        w8, [x8, #0xa8]
00000001b0d62f50 28FDFF35               cbnz       w8, loc_1b0d62ef4

                                    loc_1b0d62f54:
00000001b0d62f54 280C0012               and        w8, w1, #0xf                 ; CODE XREF=sub_1b0d62d34+528
00000001b0d62f58 1F0D0071               cmp        w8, #0x3
00000001b0d62f5c A80A8052               mov        w8, #0x55
00000001b0d62f60 08019F1A               csel       w8, w8, wzr, eq
00000001b0d62f64 69024239               ldrb       w9, [x19, #0x80]
00000001b0d62f68 3F011F72               tst        w9, #0x2
00000001b0d62f6c 69068052               mov        w9, #0x33
00000001b0d62f70 0101891A               csel       w1, w8, w9, eq
00000001b0d62f74 741A40B9               ldr        w20, [x19, #0x18]
00000001b0d62f78 A1000034               cbz        w1, loc_1b0d62f8c

00000001b0d62f7c FF0300F9               str        xzr, [sp, #0x170 + var_170]
00000001b0d62f80 E00314AA               mov        x0, x20
00000001b0d62f84 7B93C794               bl         0x1b3f47d70
00000001b0d62f88 60040034               cbz        w0, loc_1b0d63014

                                    loc_1b0d62f8c:
00000001b0d62f8c E00314AA               mov        x0, x20                      ; argument "fildes" for method imp___auth_stubs__fsync, CODE XREF=sub_1b0d62d34+580
00000001b0d62f90 9C7A0494               bl         imp___auth_stubs__fsync      ; fsync
00000001b0d62f94 00040034               cbz        w0, loc_1b0d63014

翻译成C语言伪代码:

// x19 is the context pointer (self/this)
// w1 is an input argument (flags)

// 1. Pre-check
struct SubObject* obj = self->ptr_40;
if (obj) {
    if (obj->ptr_0->status_a8 != 0) {
        goto loc_1b0d62ef4; // Busy/Error path
    }
}

// 2. Determine Sync Command
int fd = self->file_descriptor; // offset 0x18
int command = 0;

// Check config flag at offset 0x80
if (self->flags_80 & 0x02) {
    command = 0x33; // F_FULLFSYNC (51)
} 
else if ((w1 & 0x0F) == 3) {
    command = 0x55; // F_BARRIERFSYNC (85)
}

// 3. Try Specialized Sync
int result = -1;
if (command != 0) {
    // Likely fcntl(fd, command, 0)
    result = unknown_func_1b3f47d70(fd, command, 0); 
    
    if (result == 0) {
        goto success; // loc_1b0d63014
    }
}

// 4. Fallback to standard fsync
// Reached if command was 0 OR if specialized sync failed
result = fsync(fd);

if (result == 0) {
    goto success;
}

// ... handle error ...

可以看出这个逻辑中既有F_FULLSYNC又有F_BARRIERFSYNC。写了个简单demo验证了一下,PRAGMA fullsync = ON会用F_FULLSYNCPRAGMA fullsync = OFF用的是F_BARRIERFSYNC

所以,如果在苹果系统上使用自己编译的sqlite库时,需要注意把这个逻辑加上。

*总之,在iOS/macOS平台写文件的场景,需要考虑好对性能、稳定性的需求,选用合适的系统机制。

参考资料

❌