Blame - Documentation/filesystems/netfs_library.rst - yocto/kernel/common

blob: 57a64184781863f422ca0c0a02abdd9ae4c2d531 [file] [log] [blame]

David Howells	fb28afc	2021-02-22 13:17:24 +0000	[diff] [blame]	1	.. SPDX-License-Identifier: GPL-2.0
				2
				3	=================================
				4	NETWORK FILESYSTEM HELPER LIBRARY
				5	=================================
				6
				7	.. Contents:
				8
				9	- Overview.
				10	- Buffered read helpers.
				11	- Read helper functions.
				12	- Read helper structures.
				13	- Read helper operations.
				14	- Read helper procedure.
				15	- Read helper cache API.
				16
				17
				18	Overview
				19	========
				20
				21	The network filesystem helper library is a set of functions designed to aid a
				22	network filesystem in implementing VM/VFS operations. For the moment, that
				23	just includes turning various VM buffered read operations into requests to read
				24	from the server. The helper library, however, can also interpose other
				25	services, such as local caching or local data encryption.
				26
				27	Note that the library module doesn't link against local caching directly, so
				28	access must be provided by the netfs.
				29
				30
				31	Buffered Read Helpers
				32	=====================
				33
				34	The library provides a set of read helpers that handle the ->readpage(),
				35	->readahead() and much of the ->write_begin() VM operations and translate them
				36	into a common call framework.
				37
				38	The following services are provided:
				39
				40	* Handles transparent huge pages (THPs).
				41
				42	* Insulates the netfs from VM interface changes.
				43
				44	* Allows the netfs to arbitrarily split reads up into pieces, even ones that
				45	don't match page sizes or page alignments and that may cross pages.
				46
				47	* Allows the netfs to expand a readahead request in both directions to meet
				48	its needs.
				49
				50	* Allows the netfs to partially fulfil a read, which will then be resubmitted.
				51
				52	* Handles local caching, allowing cached data and server-read data to be
				53	interleaved for a single request.
				54
				55	* Handles clearing of bufferage that aren't on the server.
				56
				57	* Handle retrying of reads that failed, switching reads from the cache to the
				58	server as necessary.
				59
				60	* In the future, this is a place that other services can be performed, such as
				61	local encryption of data to be stored remotely or in the cache.
				62
				63	From the network filesystem, the helpers require a table of operations. This
				64	includes a mandatory method to issue a read operation along with a number of
				65	optional methods.
				66
				67
				68	Read Helper Functions
				69	---------------------
				70
				71	Three read helpers are provided::
				72
				73	* void netfs_readahead(struct readahead_control *ractl,
				74	const struct netfs_read_request_ops *ops,
				75	void *netfs_priv);``
				76	* int netfs_readpage(struct file *file,
				77	struct page *page,
				78	const struct netfs_read_request_ops *ops,
				79	void *netfs_priv);
				80	* int netfs_write_begin(struct file *file,
				81	struct address_space *mapping,
				82	loff_t pos,
				83	unsigned int len,
				84	unsigned int flags,
				85	struct page **_page,
				86	void **_fsdata,
				87	const struct netfs_read_request_ops *ops,
				88	void *netfs_priv);
				89
				90	Each corresponds to a VM operation, with the addition of a couple of parameters
				91	for the use of the read helpers:
				92
				93	* ``ops``
				94
				95	A table of operations through which the helpers can talk to the filesystem.
				96
				97	* ``netfs_priv``
				98
				99	Filesystem private data (can be NULL).
				100
				101	Both of these values will be stored into the read request structure.
				102
				103	For ->readahead() and ->readpage(), the network filesystem should just jump
				104	into the corresponding read helper; whereas for ->write_begin(), it may be a
				105	little more complicated as the network filesystem might want to flush
				106	conflicting writes or track dirty data and needs to put the acquired page if an
				107	error occurs after calling the helper.
				108
				109	The helpers manage the read request, calling back into the network filesystem
				110	through the suppplied table of operations. Waits will be performed as
				111	necessary before returning for helpers that are meant to be synchronous.
				112
				113	If an error occurs and netfs_priv is non-NULL, ops->cleanup() will be called to
				114	deal with it. If some parts of the request are in progress when an error
				115	occurs, the request will get partially completed if sufficient data is read.
				116
				117	Additionally, there is::
				118
				119	* void netfs_subreq_terminated(struct netfs_read_subrequest *subreq,
				120	ssize_t transferred_or_error,
				121	bool was_async);
				122
				123	which should be called to complete a read subrequest. This is given the number
				124	of bytes transferred or a negative error code, plus a flag indicating whether
				125	the operation was asynchronous (ie. whether the follow-on processing can be
				126	done in the current context, given this may involve sleeping).
				127
				128
				129	Read Helper Structures
				130	----------------------
				131
				132	The read helpers make use of a couple of structures to maintain the state of
				133	the read. The first is a structure that manages a read request as a whole::
				134
				135	struct netfs_read_request {
				136	struct inode *inode;
				137	struct address_space *mapping;
				138	struct netfs_cache_resources cache_resources;
				139	void *netfs_priv;
				140	loff_t start;
				141	size_t len;
				142	loff_t i_size;
				143	const struct netfs_read_request_ops *netfs_ops;
				144	unsigned int debug_id;
				145	...
				146	};
				147
				148	The above fields are the ones the netfs can use. They are:
				149
				150	* ``inode``
				151	* ``mapping``
				152
				153	The inode and the address space of the file being read from. The mapping
				154	may or may not point to inode->i_data.
				155
				156	* ``cache_resources``
				157
				158	Resources for the local cache to use, if present.
				159
				160	* ``netfs_priv``
				161
				162	The network filesystem's private data. The value for this can be passed in
				163	to the helper functions or set during the request. The ->cleanup() op will
				164	be called if this is non-NULL at the end.
				165
				166	* ``start``
				167	* ``len``
				168
				169	The file position of the start of the read request and the length. These
				170	may be altered by the ->expand_readahead() op.
				171
				172	* ``i_size``
				173
				174	The size of the file at the start of the request.
				175
				176	* ``netfs_ops``
				177
				178	A pointer to the operation table. The value for this is passed into the
				179	helper functions.
				180
				181	* ``debug_id``
				182
				183	A number allocated to this operation that can be displayed in trace lines
				184	for reference.
				185
				186
				187	The second structure is used to manage individual slices of the overall read
				188	request::
				189
				190	struct netfs_read_subrequest {
				191	struct netfs_read_request *rreq;
				192	loff_t start;
				193	size_t len;
				194	size_t transferred;
				195	unsigned long flags;
				196	unsigned short debug_index;
				197	...
				198	};
				199
				200	Each subrequest is expected to access a single source, though the helpers will
				201	handle falling back from one source type to another. The members are:
				202
				203	* ``rreq``
				204
				205	A pointer to the read request.
				206
				207	* ``start``
				208	* ``len``
				209
				210	The file position of the start of this slice of the read request and the
				211	length.
				212
				213	* ``transferred``
				214
				215	The amount of data transferred so far of the length of this slice. The
				216	network filesystem or cache should start the operation this far into the
				217	slice. If a short read occurs, the helpers will call again, having updated
				218	this to reflect the amount read so far.
				219
				220	* ``flags``
				221
				222	Flags pertaining to the read. There are two of interest to the filesystem
				223	or cache:
				224
				225	* ``NETFS_SREQ_CLEAR_TAIL``
				226
				227	This can be set to indicate that the remainder of the slice, from
				228	transferred to len, should be cleared.
				229
				230	* ``NETFS_SREQ_SEEK_DATA_READ``
				231
				232	This is a hint to the cache that it might want to try skipping ahead to
				233	the next data (ie. using SEEK_DATA).
				234
				235	* ``debug_index``
				236
				237	A number allocated to this slice that can be displayed in trace lines for
				238	reference.
				239
				240
				241	Read Helper Operations
				242	----------------------
				243
				244	The network filesystem must provide the read helpers with a table of operations
				245	through which it can issue requests and negotiate::
				246
				247	struct netfs_read_request_ops {
				248	void (init_rreq)(struct netfs_read_request rreq, struct file *file);
				249	bool (is_cache_enabled)(struct inode inode);
				250	int (begin_cache_operation)(struct netfs_read_request rreq);
				251	void (expand_readahead)(struct netfs_read_request rreq);
				252	bool (clamp_length)(struct netfs_read_subrequest subreq);
				253	void (issue_op)(struct netfs_read_subrequest subreq);
				254	bool (is_still_valid)(struct netfs_read_request rreq);
				255	int (check_write_begin)(struct file file, loff_t pos, unsigned len,
				256	struct page page, void *_fsdata);
				257	void (done)(struct netfs_read_request rreq);
				258	void (cleanup)(struct address_space mapping, void *netfs_priv);
				259	};
				260
				261	The operations are as follows:
				262
				263	* ``init_rreq()``
				264
				265	[Optional] This is called to initialise the request structure. It is given
				266	the file for reference and can modify the ->netfs_priv value.
				267
				268	* ``is_cache_enabled()``
				269
				270	[Required] This is called by netfs_write_begin() to ask if the file is being
				271	cached. It should return true if it is being cached and false otherwise.
				272
				273	* ``begin_cache_operation()``
				274
				275	[Optional] This is called to ask the network filesystem to call into the
				276	cache (if present) to initialise the caching state for this read. The netfs
				277	library module cannot access the cache directly, so the cache should call
				278	something like fscache_begin_read_operation() to do this.
				279
				280	The cache gets to store its state in ->cache_resources and must set a table
				281	of operations of its own there (though of a different type).
				282
				283	This should return 0 on success and an error code otherwise. If an error is
				284	reported, the operation may proceed anyway, just without local caching (only
				285	out of memory and interruption errors cause failure here).
				286
				287	* ``expand_readahead()``
				288
				289	[Optional] This is called to allow the filesystem to expand the size of a
				290	readahead read request. The filesystem gets to expand the request in both
				291	directions, though it's not permitted to reduce it as the numbers may
				292	represent an allocation already made. If local caching is enabled, it gets
				293	to expand the request first.
				294
				295	Expansion is communicated by changing ->start and ->len in the request
				296	structure. Note that if any change is made, ->len must be increased by at
				297	least as much as ->start is reduced.
				298
				299	* ``clamp_length()``
				300
				301	[Optional] This is called to allow the filesystem to reduce the size of a
				302	subrequest. The filesystem can use this, for example, to chop up a request
				303	that has to be split across multiple servers or to put multiple reads in
				304	flight.
				305
				306	This should return 0 on success and an error code on error.
				307
				308	* ``issue_op()``
				309
				310	[Required] The helpers use this to dispatch a subrequest to the server for
				311	reading. In the subrequest, ->start, ->len and ->transferred indicate what
				312	data should be read from the server.
				313
				314	There is no return value; the netfs_subreq_terminated() function should be
				315	called to indicate whether or not the operation succeeded and how much data
				316	it transferred. The filesystem also should not deal with setting pages
				317	uptodate, unlocking them or dropping their refs - the helpers need to deal
				318	with this as they have to coordinate with copying to the local cache.
				319
				320	Note that the helpers have the pages locked, but not pinned. It is possible
				321	to use the ITER_XARRAY iov iterator to refer to the range of the inode that
				322	is being operated upon without the need to allocate large bvec tables.
				323
				324	* ``is_still_valid()``
				325
				326	[Optional] This is called to find out if the data just read from the local
				327	cache is still valid. It should return true if it is still valid and false
				328	if not. If it's not still valid, it will be reread from the server.
				329
				330	* ``check_write_begin()``
				331
				332	[Optional] This is called from the netfs_write_begin() helper once it has
				333	allocated/grabbed the page to be modified to allow the filesystem to flush
				334	conflicting state before allowing it to be modified.
				335
				336	It should return 0 if everything is now fine, -EAGAIN if the page should be
				337	regrabbed and any other error code to abort the operation.
				338
				339	* ``done``
				340
				341	[Optional] This is called after the pages in the request have all been
				342	unlocked (and marked uptodate if applicable).
				343
				344	* ``cleanup``
				345
				346	[Optional] This is called as the request is being deallocated so that the
				347	filesystem can clean up ->netfs_priv.
				348
				349
				350
				351	Read Helper Procedure
				352	---------------------
				353
				354	The read helpers work by the following general procedure:
				355
				356	* Set up the request.
				357
				358	* For readahead, allow the local cache and then the network filesystem to
				359	propose expansions to the read request. This is then proposed to the VM.
				360	If the VM cannot fully perform the expansion, a partially expanded read will
				361	be performed, though this may not get written to the cache in its entirety.
				362
				363	* Loop around slicing chunks off of the request to form subrequests:
				364
				365	* If a local cache is present, it gets to do the slicing, otherwise the
				366	helpers just try to generate maximal slices.
				367
				368	* The network filesystem gets to clamp the size of each slice if it is to be
				369	the source. This allows rsize and chunking to be implemented.
				370
				371	* The helpers issue a read from the cache or a read from the server or just
				372	clears the slice as appropriate.
				373
				374	* The next slice begins at the end of the last one.
				375
				376	* As slices finish being read, they terminate.
				377
				378	* When all the subrequests have terminated, the subrequests are assessed and
				379	any that are short or have failed are reissued:
				380
				381	* Failed cache requests are issued against the server instead.
				382
				383	* Failed server requests just fail.
				384
				385	* Short reads against either source will be reissued against that source
				386	provided they have transferred some more data:
				387
				388	* The cache may need to skip holes that it can't do DIO from.
				389
				390	* If NETFS_SREQ_CLEAR_TAIL was set, a short read will be cleared to the
				391	end of the slice instead of reissuing.
				392
				393	* Once the data is read, the pages that have been fully read/cleared:
				394
				395	* Will be marked uptodate.
				396
				397	* If a cache is present, will be marked with PG_fscache.
				398
				399	* Unlocked
				400
				401	* Any pages that need writing to the cache will then have DIO writes issued.
				402
				403	* Synchronous operations will wait for reading to be complete.
				404
				405	* Writes to the cache will proceed asynchronously and the pages will have the
				406	PG_fscache mark removed when that completes.
				407
				408	* The request structures will be cleaned up when everything has completed.
				409
				410
				411	Read Helper Cache API
				412	---------------------
				413
				414	When implementing a local cache to be used by the read helpers, two things are
				415	required: some way for the network filesystem to initialise the caching for a
				416	read request and a table of operations for the helpers to call.
				417
				418	The network filesystem's ->begin_cache_operation() method is called to set up a
				419	cache and this must call into the cache to do the work. If using fscache, for
				420	example, the cache would call::
				421
				422	int fscache_begin_read_operation(struct netfs_read_request *rreq,
				423	struct fscache_cookie *cookie);
				424
				425	passing in the request pointer and the cookie corresponding to the file.
				426
				427	The netfs_read_request object contains a place for the cache to hang its
				428	state::
				429
				430	struct netfs_cache_resources {
				431	const struct netfs_cache_ops *ops;
				432	void *cache_priv;
				433	void *cache_priv2;
				434	};
				435
				436	This contains an operations table pointer and two private pointers. The
				437	operation table looks like the following::
				438
				439	struct netfs_cache_ops {
				440	void (end_operation)(struct netfs_cache_resources cres);
				441
				442	void (expand_readahead)(struct netfs_cache_resources cres,
				443	loff_t _start, size_t _len, loff_t i_size);
				444
				445	enum netfs_read_source (prepare_read)(struct netfs_read_subrequest subreq,
				446	loff_t i_size);
				447
				448	int (read)(struct netfs_cache_resources cres,
				449	loff_t start_pos,
				450	struct iov_iter *iter,
				451	bool seek_data,
				452	netfs_io_terminated_t term_func,
				453	void *term_func_priv);
				454
				455	int (write)(struct netfs_cache_resources cres,
				456	loff_t start_pos,
				457	struct iov_iter *iter,
				458	netfs_io_terminated_t term_func,
				459	void *term_func_priv);
				460	};
				461
				462	With a termination handler function pointer::
				463
				464	typedef void (netfs_io_terminated_t)(void priv,
				465	ssize_t transferred_or_error,
				466	bool was_async);
				467
				468	The methods defined in the table are:
				469
				470	* ``end_operation()``
				471
				472	[Required] Called to clean up the resources at the end of the read request.
				473
				474	* ``expand_readahead()``
				475
				476	[Optional] Called at the beginning of a netfs_readahead() operation to allow
				477	the cache to expand a request in either direction. This allows the cache to
				478	size the request appropriately for the cache granularity.
				479
				480	The function is passed poiners to the start and length in its parameters,
				481	plus the size of the file for reference, and adjusts the start and length
				482	appropriately. It should return one of:
				483
				484	* ``NETFS_FILL_WITH_ZEROES``
				485	* ``NETFS_DOWNLOAD_FROM_SERVER``
				486	* ``NETFS_READ_FROM_CACHE``
				487	* ``NETFS_INVALID_READ``
				488
				489	to indicate whether the slice should just be cleared or whether it should be
				490	downloaded from the server or read from the cache - or whether slicing
				491	should be given up at the current point.
				492
				493	* ``prepare_read()``
				494
				495	[Required] Called to configure the next slice of a request. ->start and
				496	->len in the subrequest indicate where and how big the next slice can be;
				497	the cache gets to reduce the length to match its granularity requirements.
				498
				499	* ``read()``
				500
				501	[Required] Called to read from the cache. The start file offset is given
				502	along with an iterator to read to, which gives the length also. It can be
				503	given a hint requesting that it seek forward from that start position for
				504	data.
				505
				506	Also provided is a pointer to a termination handler function and private
				507	data to pass to that function. The termination function should be called
				508	with the number of bytes transferred or an error code, plus a flag
				509	indicating whether the termination is definitely happening in the caller's
				510	context.
				511
				512	* ``write()``
				513
				514	[Required] Called to write to the cache. The start file offset is given
				515	along with an iterator to write from, which gives the length also.
				516
				517	Also provided is a pointer to a termination handler function and private
				518	data to pass to that function. The termination function should be called
				519	with the number of bytes transferred or an error code, plus a flag
				520	indicating whether the termination is definitely happening in the caller's
				521	context.
				522
				523	Note that these methods are passed a pointer to the cache resource structure,
				524	not the read request structure as they could be used in other situations where
				525	there isn't a read request structure as well, such as writing dirty data to the
				526	cache.